solarwindsthwackgeek speak

Tweets from the Head Geek

The Twitter Feed is unavailable at this time.

0 minutes ago

Follow me on Twitter »
Geek Speak Blog
There's no place like 127.0.0.1
8 May 2008

The Fast and the Furious - Orion, SQL, and SANs

Comments (5) | Trackbacks (0)

I get asked a lot about using Orion (which requires SQL as a database backend) with a SAN. This usually comes up when people are also leveraging the Orion NetFlow Traffic Analyzer (NTA) which can cause the database to grow very, very quickly.

Before I get started, let me say that I believe that the product documentation and the official stance of our tech support team is that we don't recommend running Orion w/NTA with a SAN, and for good reason based upon our overall experience in this area. You see, SANs are great for moving and storing very large amounts of data. In many cases you can actually read and write data more quickly to a high-performance SAN than to locally attached disk. The problem is that with applications like Orion you're not moving large chunks of data; instead, you're moving ginormous amounts of itty bitty pieces of data and most SANs just don't have the ability to handle this number of I/O transactions in the timeframes that applications like this demand. Time and time again we've seen issues where data is getting dropped when trying to write to a high-performance SAN but after moving the data to even a moderately performing local disk array the problem goes away.

For example, I worked with a customer recently that was seeing holes within some of the data sets the he was collecting and was leveraging a SAN to house his SQL database. Additionally, when trying to query the database for these results the queries would sometime time out. We turned on some perfmon counters on the SQL server and we were seeing disk queue lengths (read and write) of 200-300. Microsoft recommends that for SQL Servers with high amounts of I/O the disk queue lengths not exceed twice the number of physical disks (which in this case was 13 if I remember correctly). After moving the database to a local disk array (RAID 1+0), the problems went way...

What inspired me to write this post is that last week while I was at InterOP I had a chance to meet with several of the SAN vendors and to review some of their newer technology and it seems like maybe SANs have now evolved to a place where they could be used very effectively in these scenarios and may even out perform local high-speed arrays. I'll have to wait to see, but it definitely seemed promising.

If any of you out there are effectively utilizing SANs in environments please drop a comment with some specifics.

Flame on...
Josh

 

Comments (5)  |   | 
| More
Sign In or Join and Add a Comment!

5 Comments

jonchill
15 May 2008 at 2:31AM CST

We run our Orion and Cirrus installs off a SAN along with IPMonitor, the spec of the blade is 4 gig of memory and a Quad core processor and we've not had any issues with SQL (Touchwood) we don't run any modules and our DB is only 10 gig so its hardly a huge install but it works fine for us and its the way the companies going along with a huge investment into virtulisation.

stacy.patten
13 Aug 2008 at 3:58PM CST

I can advise that we currently run Orion SLX w/ App Monitor, Wireless and Cirrus and have huge issues with the SAN.  I can see our disk queue typically at 250 and we are looking to move away from this setup.  Maybe with the developments of current SAN technology this is not a problem, but it is a disaster for us.

kbrewer
16 Sep 2008 at 2:39PM CST

I have NPM with NTA with a SQL server using Fibrechannel connected SAN storage. I have no issues with the performance. The sample I just ran has the server with an Avg. Disk Queue Length of just over 7. The SQL server is a dedicated server.

gbrance
26 Sep 2008 at 4:34PM CST

I have a EMC CX3_40 SAN using Fiber Channel. The SQL Server has 8GB of Memory. I have over 6000 elements that I am monitoring. I am also running the netflow and APM modules. I usually see Avg. Disk Queue lenghts of less than 8 on the server. However the LUN is a RAID 5 10K RPM Raid Group with both the MDF and LDF files on the same Drive. This is also on a 2GB bus. I noticed some slowness issues. I recently moved the Data Storage to a Drive on Raid 10 15K RPM storage with a 4GB FC bus. Both the MDF and LDF files are on separate drives now and I have noticed a improvement in responsiveness from Orion. The avg disk queue length is below .5 on this SQL cluster.

NinjaNerd56
11 Nov 2009 at 4:40PM CST

Josh,

I fought AGAINST having NPM on a SAN at my last job...and felt somewhat justified when performance was clearly better on local 10K RPM disks.

Now I'm at a job where NPM w/NTA is on a SAN. I expressed my concerns and cited the docs, previous experience, etc. The "compromise" was a dedicated SQL 2005 server with FC disks on the SAN.

So far, the performance is outstanding and there are no data drops seen.

I would add that I tuned all the servers TCP/IP stacks, swap space, startup loading sequence, and a host of things I've acquired over the years to mitigate any issues.

One metric was TCP Retransmission rates, using login to Orion, waiting for a custom map with > 400 node objects to load, navigating to the NTA module, and waiting for the TopXX Pie Charts to display. Before "tuning", I was seeing 150-200 TCP Retransmissions. Afterwards...a total of 2.

Every little "bit" helps....

« PREVIOUS POST
5 May 2008
A few tips on customizing the...
NEXT POST »
11 May 2008
Anybody reading this in the...


« PREVIOUS POST
5 May 2008
A few tips on customizing the...
 
NEXT POST »
11 May 2008
Anybody reading this in the...



RECENT POSTS


FIND