CEPH: TCP Performance Tuning

ethernet-cable-fade

Below are a few TCP tunables that I ran into when looking into TCP performance tuning for CEPH.

Note that there are two separate sections for 10GE connectivity, so you will want to test with both to find what works best for your environment.

To implement, we just add what is below to /etc/sysctl.d/99-sysctl.conf and run “sysctl -p“. Changes are persistent across reboots. Ideally these TCP tunables should be deployed to all CEPH nodes (OSD most importantly).

## Increase Linux autotuning TCP buffer limits
## Set max to 16MB (16777216) for 1GE
## 32MB (33554432) or 54MB (56623104) for 10GE

# 1GE/16MB (16777216)
#net.core.rmem_max = 16777216
#net.core.wmem_max = 16777216
#net.core.rmem_default = 16777216
#net.core.wmem_default = 16777216
#net.core.optmem_max = 40960
#net.ipv4.tcp_rmem = 4096 87380 16777216
#net.ipv4.tcp_wmem = 4096 65536 16777216

# 10GE/32MB (33554432)
#net.core.rmem_max = 33554432
#net.core.wmem_max = 33554432
#net.core.rmem_default = 33554432
#net.core.wmem_default = 33554432
#net.core.optmem_max = 40960
#net.ipv4.tcp_rmem = 4096 87380 33554432
#net.ipv4.tcp_wmem = 4096 65536 33554432

# 10GB/54MB (56623104)
net.core.rmem_max = 56623104
net.core.wmem_max = 56623104
net.core.rmem_default = 56623104
net.core.wmem_default = 56623104
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 4096 87380 56623104
net.ipv4.tcp_wmem = 4096 65536 56623104


## Increase number of incoming connections. The value can be raised to bursts of request, default is 128
net.core.somaxconn = 1024

## Increase number of incoming connections backlog, default is 1000
net.core.netdev_max_backlog = 50000

##  Maximum number of remembered connection requests, default is 128
net.ipv4.tcp_max_syn_backlog = 30000

## Increase the tcp-time-wait buckets pool size to prevent simple DOS attacks, default is 8192
net.ipv4.tcp_max_tw_buckets = 2000000

# Recycle and Reuse TIME_WAIT sockets faster, default is 0 for both
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

## Decrease TIME_WAIT seconds, default is 30 seconds
net.ipv4.tcp_fin_timeout = 10
 
## Tells the system whether it should start at the default window size only for TCP connections
## that have been idle for too long, default is 1
net.ipv4.tcp_slow_start_after_idle = 0
 
#If your servers talk UDP, also up these limits, default is 4096
net.ipv4.udp_rmem_min = 8192
net.ipv4.udp_wmem_min = 8192
 
## Disable source redirects
## Default is 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0

## Disable source routing, default is 0
net.ipv4.conf.all.accept_source_route = 0

Reference here

HomeLab Adventures: Freenas Volume 1

Humpty_DumptySo I am not going to lie, I am a very sick man, but I am also not afraid to admit it. I have a terrible, terrible addiction which is my homelab.

 

It all started out so innocently… An old Sun Ultra 5 to learn Sparc Solaris at home.. A couple of desktops converted over to rack mount cases and racked in a cheap telecom rack in my unfinished basement. This was very early in my career when I had a lot to learn and plenty of free time to study. However that was many moons ago.

 

I measure the time that has past since then by the amount of gray that has crept into my beard. As I moved from one role to the next, I found that I had the pick of the litter when it came to retired equipment.

 

Previously I would have been lucky to land an old Xeon (without virtualization support) to take home, something chock full of PCI-X cards (or worse, SCSI) that were useless to me in a desktop. However now I was landing quad core Nehalems (perfect for virtualization) with handfuls of memory and sexy pci-e SAS/Sata raid controllers. Oh and tons of SSDs that were considered too small not 6 months after they were unboxed. Lets not even get into my networking setup… as that is a tale for a different day.

Once I had a deployed a couple of very nice and fully loaded ESX servers, I came to find that the performance bottleneck in my lab was storage. Sure I had terabytes of SAS and SATA disk, but it was all local. I had nothing that allowed me to fail over between host. Thus began a quest.. a quest for the ages.

 

Knowing myself as I do, I knew that I was not going to be satisfied by throwing a cheap NAS together out of a couple or SATA disk. No, desktop performance was not going to cut it. I needed 15k SAS, a raid controller with battery backup, a handful of spindles, and a beefy tower to allow for plenty of expansion (yes, all my machines were converted to towers). I also knew I was going to need to use LACP or some other network bonding to cable my creation into my network. Heck, I even dared check out the cost of a cheap 10Gb small business class switch (yup too expensive… lets wait a year or so).

Which brings us to today. The day I fired up my first freenas box.

My rough specs are as follows.

  • Gigabyte Z97-HD3
  • Intel Core i3 3.8Ghz
  • 5x600gb 15K SAS -Raid-Z1
  • 1x32gGB SSD
  • 2x4tb 7k SATA – Raid 1
  • 16GB Memory
  • LSI 9260 8i

 

So now what – move some VMS onto it and call it a day. Well that’s no fun. Lets see what kind of performance we can push through this baby. I mean after all, we are not using 15k SAS drives for nothing.

 

Side note, its not exactly plug and play when it comes to using SAS drives in a standard tower. Even if you have a SAS capable controller, you are going to need a backplane of some sort to provide power and i/o connectivity. Finding something that will fit the bill, without having to use a cheap one-off backplane is a challenge to say the least. For my lab I picked up a couple of these. 99% of what you see in the box stores will not support SAS drives, and its not always obvious at first glance… you have to check the specs on the side of the box. Also don’t walk into Fry’s thinking you will find one… I have tried. Microcenter seems to be the only large chain that stocks an internal SAS enclosure.

 

For testing I am have ssh’d into a linux desktop that is on the same network as the freenas box. The desktop has only 1gb network interface. Both systems a cabled northbound to a Cisco 3560g.

 

First let’s mount up our RaidZ-1 volume by sticking this in our /etc/fstab and running mount  /mnt.

freenas:/mnt/freenas-vol-1      /mnt    nfs rsize=8192,wsize=8192,timeo=14,intr

 

Boom, there it is our new fancy mount. Now to run the tests. However that will come in part 2 as I plan not to rush through this. As far as I understand, there can be a bit of tuning in Freenas, so it might take me a bit to get everything dialed in.

Related articles

Turn an Old Computer Into a Do-Anything Home Server with FreeNAS 8
Configuring ZFS on FreeNAS for backup storage from a Windows Domain
Sync Hacks: How to Set Up FreeNAS with BitTorrent Sync Using a Plugin
RHEL6 – Quick and Dirty NFS How To

Basic AIX Performance Troubleshooting Commands

600px-Orange_x.svgWow, today I logged into my first AIX Server in about 4.5 years. It was a horrible experience. I’ve been working with Redhat/CentOS pretty much exculsively for so long, I was mostly helpless to do anything of importance on the CLI other than create a few users and move some files around.  None of the common commands that I am so used to using even exist in AIX.

Figured I would do a bit of homework and figure out how to do some basic troubleshooting before I was in a server down situation with no idea how to troubleshoot.

Checking Free Memory

To check free memory on a box use the svmon command.

svmon -G

Overall System Status

For this you will probably want to use topas, which is pretty simiar to top. Topas gives you a quick and dirty overview of what is going on on a system. Here you can find CPU usage, top processes, disk utililization. Check out the fancy screen shot below.

Top-ass1

List Volume Groups

Wow, Linux has really confused me on this one. Anyway, use lsvg

# lsvg -o
rootvg
crsrdb_bin
crsprdb_data
crsprdb_index
crsprdb_arch
crsprdb_rman

List Info About a Volume Group.

# lsvg rootvg

Display Names of all Logical Volumes in a Volume Group.

# lsvg -l rootvg

Display Physical Memory

# lsattr -El sys0 -a realmem

Finding Disk I/O Issues

Sar appears to be a fine option here. Especially since I am looking for percent busy. Iostat also exists on AIX, btw.

# sar -d 1 2

Show Network Throughput

The more I poke around the internet trying to figure out how to actually use AIX the more I keep running into topas. Anyway this one is a good one

#topas -E

I plan to have more of these one liners documented here in the future, but for now this is going to have to do.