HomeLab Adventures: Freenas Volume 1

Humpty_DumptySo I am not going to lie, I am a very sick man, but I am also not afraid to admit it. I have a terrible, terrible addiction which is my homelab.

 

It all started out so innocently… An old Sun Ultra 5 to learn Sparc Solaris at home.. A couple of desktops converted over to rack mount cases and racked in a cheap telecom rack in my unfinished basement. This was very early in my career when I had a lot to learn and plenty of free time to study. However that was many moons ago.

 

I measure the time that has past since then by the amount of gray that has crept into my beard. As I moved from one role to the next, I found that I had the pick of the litter when it came to retired equipment.

 

Previously I would have been lucky to land an old Xeon (without virtualization support) to take home, something chock full of PCI-X cards (or worse, SCSI) that were useless to me in a desktop. However now I was landing quad core Nehalems (perfect for virtualization) with handfuls of memory and sexy pci-e SAS/Sata raid controllers. Oh and tons of SSDs that were considered too small not 6 months after they were unboxed. Lets not even get into my networking setup… as that is a tale for a different day.

Once I had a deployed a couple of very nice and fully loaded ESX servers, I came to find that the performance bottleneck in my lab was storage. Sure I had terabytes of SAS and SATA disk, but it was all local. I had nothing that allowed me to fail over between host. Thus began a quest.. a quest for the ages.

 

Knowing myself as I do, I knew that I was not going to be satisfied by throwing a cheap NAS together out of a couple or SATA disk. No, desktop performance was not going to cut it. I needed 15k SAS, a raid controller with battery backup, a handful of spindles, and a beefy tower to allow for plenty of expansion (yes, all my machines were converted to towers). I also knew I was going to need to use LACP or some other network bonding to cable my creation into my network. Heck, I even dared check out the cost of a cheap 10Gb small business class switch (yup too expensive… lets wait a year or so).

Which brings us to today. The day I fired up my first freenas box.

My rough specs are as follows.

  • Gigabyte Z97-HD3
  • Intel Core i3 3.8Ghz
  • 5x600gb 15K SAS -Raid-Z1
  • 1x32gGB SSD
  • 2x4tb 7k SATA – Raid 1
  • 16GB Memory
  • LSI 9260 8i

 

So now what – move some VMS onto it and call it a day. Well that’s no fun. Lets see what kind of performance we can push through this baby. I mean after all, we are not using 15k SAS drives for nothing.

 

Side note, its not exactly plug and play when it comes to using SAS drives in a standard tower. Even if you have a SAS capable controller, you are going to need a backplane of some sort to provide power and i/o connectivity. Finding something that will fit the bill, without having to use a cheap one-off backplane is a challenge to say the least. For my lab I picked up a couple of these. 99% of what you see in the box stores will not support SAS drives, and its not always obvious at first glance… you have to check the specs on the side of the box. Also don’t walk into Fry’s thinking you will find one… I have tried. Microcenter seems to be the only large chain that stocks an internal SAS enclosure.

 

For testing I am have ssh’d into a linux desktop that is on the same network as the freenas box. The desktop has only 1gb network interface. Both systems a cabled northbound to a Cisco 3560g.

 

First let’s mount up our RaidZ-1 volume by sticking this in our /etc/fstab and running mount  /mnt.

freenas:/mnt/freenas-vol-1      /mnt    nfs rsize=8192,wsize=8192,timeo=14,intr

 

Boom, there it is our new fancy mount. Now to run the tests. However that will come in part 2 as I plan not to rush through this. As far as I understand, there can be a bit of tuning in Freenas, so it might take me a bit to get everything dialed in.

Related articles

Turn an Old Computer Into a Do-Anything Home Server with FreeNAS 8
Configuring ZFS on FreeNAS for backup storage from a Windows Domain
Sync Hacks: How to Set Up FreeNAS with BitTorrent Sync Using a Plugin
RHEL6 – Quick and Dirty NFS How To

Much Todo About Linux/RHEL Passwords

CryptographyMy latest gig requires me to know more about passwords, password expiration, and password policies than I have ever had to know before. Now on the surface this is a bad thing, as it makes my job much harder as I have to maintain more passwords on more individual systems than I can shake a stick at (seriously no ldap or anything), however on the plus side I am learning a few things here and there that I never had to know before. I thought I would take this oportunity to got down a few of the things that I have learned.

Password Reuse Policy

The configuration item for this can be changed by editing the following file.

/etc/pam.d/system-auth

look for the line that ends in "remember". The example below will remember the last 5 passwords, and will not allow you to reuse one of these last 5.

password    sufficient    pam_unix.so sha512 shadow nullok try_first_pass remember=5

Old passwords are actually stored in the following text file /etc/security/opasswd.

Password Aging Policy

The configurations for password aging are found in /etc/login.defs. Below I am requiring users to change there password every 28 days, forcing them to keep the a password for at least 7 days before changing it, configuring the minimum password length, and setting the number of days warning that will be given before I expire a password.

PASS_MAX_DAYS   28
PASS_MIN_DAYS   7
PASS_MIN_LEN    8
PASS_WARN_AGE   7

Password Encryption Method

This is also stored in the /etc/login.defs. Here I am using SHA512.

# Use SHA512 to encrypt password.
ENCRYPT_METHOD SHA512

Password Complexity Settings

Take a look at the line below from /etc/pam.d/system-auth

password    requisite     pam_passwdqc.so min=disabled,disabled,8,8,8 enforce=everyone retry=3

Ok now this one is a bit tricky, but the above essentially disallows passwords from any single character
class, and disallows a password with only two character classes, sets a minimum length of 8 characters for a
passphrase, a minimum length of 8 characters for a password from any
three character classes, and a minimum length of 8 characters from
four character classes.

Locking User Accounts Based on Failed Logins

Ok so this one also comes from /etc/pam.d/system-auth.

auth        required      pam_tally2.so deny=3 onerr=fail unlock_time=300 magic_root

Above I am locking at 4 failed logins, and locking the user for 300 seconds, or 5 minutes. Man I am an ass.

 

Related articles

How Do I Create a Strong Password?
Simple solution to the password reuse problem.
The Most Unsafe Passwords of 2012 Look a Lot Like the Ones from 2011

Recovering from failed vxevac

Ilovebunt3-214x300

If you have ever evacuated disks in Veritas, every so often this will happen to hang.  Usually you terminate your session or who knows what.  Kinda like Joe Girardi's willingness to sacrifice outs for no good reason every time the Yankees hottest hitter is at the plate.  It happens, you can't explain it, you move on.  Back to technology – vxtask list shows no tasks, but you get errors trying to rerun the failed evac. 

 

For example:

Plex %5 in volume rman is locked by another utility

Plex rman-01 in volume rman is locked by another utility

Subdisk rman_7_tmp-01 in plex rman-01 is locked by another utility

vxprint -hf is our best friend, as it shows you any flags that are set

v  rman    fsgen        ENABLED  15625864960 -     ACTIVE   ATT1    –

pl %5           rman   ENABLED  11719399168 -     TEMPRM   SDMVTMP –

sd rman_6-01 %5         ENABLED  1953232896 9766166272 -    SDMVDST –

pl rman-01 rman    ENABLED  15625864960 -     ACTIVE   SDMV1   –

sd rman_1-01 rman-01 ENABLED 1953234688 0     -        -       –

sd rman_2-01 rman-01 ENABLED 1953232896 1953234688 -   -       –

sd rman_3-01 rman-01 ENABLED 1953232896 3906467584 -   -       –

sd rman_4-01 rman-01 ENABLED 1953232896 5859700480 -   -       –

sd rman_5-01 rman-01 ENABLED 1953232896 7812933376 -   -       –

sd rman_6_tmp-01 rman-01 ENABLED 1953232896 9766166272 – SDMVSRC –

sd rman_7_tmp-01 rman-01 ENABLED 1953232896 11719399168 – -    –

sd rman_8-01 rman-01 ENABLED 1953232896 13672632064 -  -       -

We can see that we have flags set on the temporary plex (from the failed evac), the subdisk for the temporary plex, the main plex, the subdisk in the main plex, as well as the volume itself.  We need to clear flags to be able to finish re-start our evac.  I will also cut the lines on the vxprint that don't change for the purpose of shortening this post.

vxmend -g rman_dg clear all rman %5

So we cleared the volume and temp plex flags, here's the vxprint -htf output afterwards

v  rman    fsgen        ENABLED  15625864960 -     ACTIVE   -       –

pl %5           rman    ENABLED  11719399168 -     TEMPRM   -       –

sd rman_6-01 %5         ENABLED  1953232896 9766166272 -    SDMVDST –

pl rman-01 rman    ENABLED  15625864960 -     ACTIVE   SDMV1   –

sd rman_6_tmp-01 rman-01 ENABLED 1953232896 9766166272 – SDMVSRC –

 

So now with the flags cleared we can remove the temporary plex

vxplex -g rman_dg -o rm dis %5

 

And once again our new vxprint -htf

v  rman    fsgen        ENABLED  15625864960 -     ACTIVE   -       –

pl rman-01 rman    ENABLED  15625864960 -     ACTIVE   SDMV1   –

sd rman_6_tmp-01 rman-01 ENABLED 1953232896 9766166272 – SDMVSRC –

 

Great, now down to two flags, the one on the plex and the one on the source disk of our original evac.  Clearing flags from subdisks is a lot trickier than clearing flags from volumes and plexes.  Because the tutil0 flga is already set, we will need to force the clear.  We clear by setting it to "".

vxedit -g rman_dg -f set tutil0="" rman_6_tmp-01

 

Once again, vxprint -htf

v  rman    fsgen        ENABLED  15625864960 -     ACTIVE   -       -

pl rman-01 rman    ENABLED  15625864960 -     ACTIVE   SDMV1   –

sd rman_6_tmp-01 rman-01 ENABLED 1953232896 9766166272 – -     –

 

And lastly, we clear the flag on the plex.  Why in this order?   Because I'm writing this up after I fixed my issues.  In the interest of not editing vxprint outputs, it's like this.  In retrospect, this could have been cleared with the first one we ran in the beginning.

vxmend -g rman_dg clear all rman rman-01

 

And finally, the way a vxprint -htf should look when all is healthy.

v  rman    fsgen        ENABLED  15625864960 -     ACTIVE   -       –

pl rman-01 rman    ENABLED  15625864960 -     ACTIVE   -       –

sd rman_1-01 rman-01 ENABLED 1953234688 0     -        -       –

sd rman_2-01 rman-01 ENABLED 1953232896 1953234688 -   -       –

sd rman_3-01 rman-01 ENABLED 1953232896 3906467584 -   -       –

sd rman_4-01 rman-01 ENABLED 1953232896 5859700480 -   -       –

sd rman_5-01 rman-01 ENABLED 1953232896 7812933376 -   -       –

sd rman_6_tmp-01 rman-01 ENABLED 1953232896 9766166272 – -     –

sd rman_7_tmp-01 rman-01 ENABLED 1953232896 11719399168 – -    –

sd rman_8-01 rman-01 ENABLED 1953232896 13672632064 -  -       –

 

At this point, feel free to proceed with your evac again.  If you're wondering what the putil and tutil fields are, here is what I found courtesy of Symantec:

http://www.symantec.com/business/support/index?page=content&id=TECH15609

 

Guest Authored By: @momkvi

 

Resolving SCSI Reservation Conflicts/Locks in Vsphere 4.0

Blue_lock-main1A few days ago we got hit with a ton of alerts which indicated that a handful of VMs were down, then up, and down again. This cycle continued several times.

At first, after a bit of digging through logs, we thought that the issue was related to scsi reservation errors, but we were already compliant with the best practices for 3PAR mentioned here. So we dug deeper and found that we were in fact suffering from SCSI locks. Go here for more information.

 According to VMware…

"The second
category involves acquisition of locks. These are locks related to VMFS
specific meta-data (called cluster locks) and locks related to files
(including directories). Operations in the s
econd category occur much more frequently than operations in the first category. The following are examples of VMFS operations that require locking metadata:

  • Creating a VMFS datastore
  • Expanding a VMFS datastore onto additional extents
  • Powering on a virtual machine
  • Acquiring a lock on a file
  • Creating or deleting a file
  • Creating a template
  • Deploying a virtual machine from a template
  • Creating a new virtual machine
  • Migrating a virtual machine with VMotion
  • Growing a file, for example, a Snapshot file or a thin provisioned Virtual Disk

To resolve a SCSI Lock, log into each of your ESX boxes and run the following command. 

# esxcfg-info | egrep -B5 "s Reserved|Pending

Look for the output below, as the host that has "Pending Reservation" value greater than one is causing the lock.

|—-Pending Reservations……………. 1

Now reset the lun.

vmkfstools –lock lunreset /vmfs/devices/disks/vml.02000000006001c230d8abfe000ff76c198ddbc13e50455243


Veritas Enterprise Administrator Not Displaying Objects/Agents

Symantec_veritas_storage_foundation_052009Its been a while since I have done much with Veritas Storage Foundation, so I was at a bit of a loss after firing up the VEA gui on a fresh install on Centos 5.4, and not seeing any of the agents that I was used to seeing. A quick check on the command line showed that the Storage Agent was in fact running. A vxdisk list displayed my disks without issue, but the GUI was blank except for the server name.

A quick google search lead me to this Symantec KB article…

http://seer.entsupport.symantec.com/docs/302156.htm.

While this article was helpful it did not solve my issues as I mentioned above, my storage agent was in fact running without issue. However I was able to verify similar errors in my vxsis.log

:
Thu Mar 27 12:40:03 2008:3342:get_objects_by_type: Error in GetObjects : 0xc1000039
Thu Mar 27 12:40:03 2008:3342:database::HandleILCacheRequest:Error in get_objects_by_type() : 0xc1000039
Thu Mar 27 12:40:03 2008:3342:rpc_object_fetch: Could not fetch objects

So I started searching on the errors above, and ran into this post on the Symatec Forum, where one of the posters sugguest that the /etc/host file might be the culprit.

Sure enough a quick check of my ./etc/hosts revealed that my servers ip and FQDN were not present. I corrected and then followed the steps below.

First I stopped VEA Service

> /opt/VRTS/bin/vxsvcctrl stop    

Then I reconfigured Veritas

>/opt/VRTS/install/installsf
-configure

I restarted my VEA GUI and reconnected and found all was as it should be.

Raid Levels Explained and Simplified

b39c97c1c0dfbdc1eb7636c231493133
As a Systems Administrator, I deal with Raid 1(mirroring) pretty much exclusively. Hell, nowadays when building a server the server automatically mirrors your Operating System disks for you, which means that you do not even need to understand what is happening behind the scenes. You just pop your two drives in your server and go. However the world of the San Administrator is much more complicated.

First off its important to know that RAID stands for either “Redundant Array of Independent Disks”, or less commonly  “Redundant Array of Inexpensive Disks”. Either way you slice it (pun intended) the basic idea of RAID is to combine multiple hard disks to either increase performance or increase redundancy.

Before I get started its important to introduce the term LUN. A LUN is a logical disk that consists of raw physical
disk space.
LUNs are created as a basic part of the storage provisioning process. They are presented across a SAN to a server as a single physical disk.

Note that the title of this article is “Raid Levels Explained and Simplified“, and when I say simplified I mean it. I am going to give a brief overview of most of the common RAID levels and then present a weakness and strength. Scroll down to the bottom of the article for links to more in depth articles and web pages. 

 

RAID 0: Striped…No Fault Tolerance

OK, in my opinion, and in the opinion of many other, RAID 0 is not even RAID, because there is no redundancy. If a disk fails, you are toast. Basically your take a slice of two disk or more disks and create a LUN. For example lets say that you as the Sysadmin request 1 80GB disk from your local SAN Admin. In the scenario below your SAN guru would carve 8 10GB blocks and present them in order (block 1,2,3,4,5,6,7,8) to you as a single LUN. RAID 0 provides good read and write performance. In the end RAID 0 is striping which is the most important thing that you probably need to know about it.

Raid0

Continue reading

Linux SAN Disk Managment via DM-Multipath

A little background…
Most of the time, I have used the RDAC driver in Linux to manage SAN disks in Linux. The RDAC driver is used to hide the complexity of multiple paths and to
present redundant paths as a single path which can be used as you would
a standard SCSI / IDE / SAS / SATA drive. Seeing only one device makes managing your disks much easier.

However where I work we only use RDAC with our IBM FastT, Sun 6140 and STK Flexline storage arrays. RDAC is not for LSI based storage such as Hitachi, Clarion, and EMC. For these servers we manage SAN disk with DM-Multipath.

Setup…
Setting up DM-Multipath is not hard, first you need to make sure that you install the package, device-mapper-mulitpath, and you will need to configure your multipath.conf and drop it into /etc. Below is some info on how to do so.

http://kbase.redhat.com/faq/docs/DOC-3691

You will also need to make sure that you enable the multipathd daemon. This daemon is in charge of checking for failed paths.

Multipath Command…
For those use to using RDAC, DM-Multipath takes some getting used to, especially when you see the output from fdisk -ll.

In one particular instance I was given the disk name of /dev/sdm as the name of the new disk on this box. The output from the fdisk -l command is not exactly helpful, as there are a ton of psuedo devices showing up in my output. This is where the multipath command comes in handy.

Continue reading