Computers: Can't Live With 'Em, Pass the Beer Nuts

Friday, December 7, 2007

Reason #682 Why I Get Frustrated with IBM

An application developer down the hall came in the office late yesterday asking if I could install the HP LaserJet 4000 printer drivers on one of our IBM AIX servers. "Sure, no problem," I said, knowing in my core I was about to embark on a painful adventure.

This morning, I set about trying to locate the printer drivers on IBM's website. Fifteen minutes of thrashing about... no deal. They link to a huge list of some Infoprint driver crap, but nothing for HP printers.

I checked out HP's website, which is almost as poorly designed as IBM's. IBM's is worse because not only is the design bad, but they don't let you download anything useful. HP has lots of useful stuff on their website, but it's just really hard to find.

Anyway, after a few minutes of thrashing on HP's website, I found they have some generic un*x drivers for the LaserJet 4000 series, but I knew that wasn't going to fly on the AIX server. I needed the *.rte files from IBM.

I turned to Google, and found many posts to technical forums, and each one went like this:

Question: Where on the web can I find HP printer drivers for IBM?

Answer: You can't. I think they're on the installation CDs somewhere.

I then dove into my storage cabinet and pulled out my big box o' AIX cds. I located the AIX 5.2 install cds (seven of them) for that server as well as numerous other randomly labeled media. I popped the first CD into my Mac and started searching for something that looked like printer drivers. About this time, the developer guy popped in and said that his vendor said it was on disc one. Hmmm... okay.

Longer story shorter, it's not. It's disc three. I'll say it again so try to make sure Google picks it up for you out there searching the web. The HP LaserJet 4000 series printer drivers for AIX 5.2 are on installation cd number 3 of 7.

Just pop that cd into your server, or remotely mount it via NFS like I did, and run:

# smitty printers
Printer/Plotter Devices
Install Additional Printer/Plotter Software
choose /cdrom or whatever mount point you used
hit list to choose the printer drivers you want

The trick is finding them. Once you find them, it's super easy.

Thursday, December 6, 2007

Update: Virtual Frame Buffer for use with Oracle Reports Server

Well, it turns out that the handy VFB I described a couple posts down doesn't work with SQR. I found a Hyperion document that had corrected resolution settings for the virtual buffer:

/usr/openwin/bin/Xvfb :5 -dev vfb screen 0 1152x900x8 &
/usr/openwin/bin/twm -display :5 -v &
DISPLAY=:5.0; export DISPLAY

Apparently, SQR is picky about the resolution. 1152x900x8 works, 1600x1200x32 did not.

Wednesday, November 21, 2007

X Server Basics

We have been using a commercial X Windows server on our PCs to get GUI access to our unix boxes for years. Recently though, a new set of people here need to get access to a new AIX system we have. They didn't want to spring for commercial licenses, so I introduced them to the Cygwin/X server software from http://x.cygwin.com, available at no cost under a modified GNU license.

Download the software, making sure to choose to install the inetutils and xorg-x11 portions. I also installed the openssh piece so I could use SSH to connect to servers if I wanted to.

Once installed, launch Cygwin and it'll give you a unix-like terminal interface to your PC files.

Make sure dtlogin is running on the remote unix host. If it isn't, run this on the host:

# /usr/dt/bin/dtlogin &

Then on your PC in the Cygwin window, run:

Xwin -query <remote_hostname> -from <my_pc_hostname_or_ip>

That will launch the nice GUI Xwindows login for the remote host.

Friday, November 16, 2007

Virtual Frame Buffer for use with Oracle Reports Server

Our DBA set up Oracle Reports Server on one of my Solaris unix servers so that our applications folks can make pretty graphs and send them out to administration. Oracle Reports Server requires a connection to an X-Server to draw the pretty graphs, even though it's running in batch. Wonderful. There's no graphics console on the server, so...

Our DBA went over to a unix workstation he runs testing on, logged in, set xhost + to allow the process (well, everyone really) to connect, and then set the DISPLAY variable in the application script to connect to the workstation for the X-Server access. Pretty kludgy solution, but it worked.

All that worked fine until someone stepped on the switch on the power strip for the workstation and it was down over the weekend without anyone noticing. I started it back up, but didn't log in and had no idea about the DISPLAY setting on the other production server. Fast forward about a week, imagine applications developers running around screaming about their graphing not working, and you've got a good picture.

Our DBA eventually remembered the DISPLAY connection he'd set up, and we got the workstation logged back in. Then I went to work finding an alternative.

I located these documents:

http://www.sun.com/bigadmin/content/submitted/virtual_buffer.html
http://www.idevelopment.info/data/Unix/General_UNIX/GENERAL_XvfbWithOracle9iAS.shtml

They were helpful, but of course, we slightly incorrect. Based on their recommendations, with a tweak to the Xvfb command to get the syntax correct, I came up with the following:

In the script that does the graphics, these three lines start up the virtual frame buffer, twm, and set up the DISPLAY variable correctly:

/usr/openwin/bin/Xvfb :5 -dev vfb screen 0 1600x1200x32 &
/usr/openwin/bin/twm -display :5 -v &
DISPLAY=:5.0; export DISPLAY

At the end of the script, this line kills the vfb to make sure it's not hanging around doing nothing:

/usr/bin/kill `ps -ef | grep Xsun | grep :5 | awk '{print $2}'` > /dev/null 2>&1

Starting up the virtual frame buffer gives the Oracle Reports Server process something to connect to, and it runs on the local host even though there's no graphics console. Nice!

Now I'll go log out of that workstation...

Friday, November 9, 2007

Customize Mailman Messages

To create a customized welcome message in Mailman 2.1.5 that's sent to users when they subscribe, follow these steps:

Create a directory mailman/data/lists/yourlist/en (for English language) and copy subscribeack.txt from /usr/local/mailman/templates into that directory. Customize it to whatever you like. Mailman will use this custom template for the welcome message.

Schweet. This is particularly handy for distribution-only lists where the "To post to this list..." instructions in the welcome message are confusing since regular subscribers would get rejected were they to follow those instructions.

Purge Those MySQL Binary Logs

I'm sure this is old hat to real MySQL people, but I'm pretty new to MySQL, especially replication, and our web server is usually pretty quiet, so I was a little surprised when I got a disk space warning because the binary logs had grown so large.

Signing onto the slave server, I ran "show slave status;" at the mysql> prompt to show that the server was reading from the binary log called "mysql-bin.004" on the master.

Logging onto the master, I ran "show master logs;" at the mysql> prompt (show binary logs; is supposed to work but did not - probably a version thing) to display the current logs saved in the mysql/var directory:

mysql> show master logs;
+---------------+
| Log_name |
+---------------+
| mysql-bin.002 |
| mysql-bin.003 |
| mysql-bin.004 |
+---------------+
3 rows in set (0.00 sec)

I then ran the purge master logs command to get rid of the deadwood:

mysql> purge master logs to 'mysql-bin.004';
Query OK, 0 rows affected (0.02 sec)

mysql> show master logs;
+---------------+
| Log_name |
+---------------+
| mysql-bin.004 |
+---------------+
1 row in set (0.00 sec)

It deleted the big files and we're out of the woods for disk space. I should probably set the max_binlog_size variable a little lower so it creates more, smaller logs so I don't reach a situation where I have a monstrous active log file and no old ones to purge.

Wednesday, October 24, 2007

Perfect Storm Disk Replacement

I recently had a drive go bad in a Sun StorEdge 3510 FC JBOD array connected to a V490 running Solaris 10 with Solaris Volume Manager. The disk was part of a five-disk stripeset that was mirrored with another stripset.

It was *not* easy finding documentation for getting this done. Tools I'd used on other systems that had SCSI attached arrays and on systems with FC attached RAID arrays did not work. The combination of JBOD with FC on a 3510 managed with Solaris Volume Manager with an active hot spare made it interesting. So without further ado...

How To Replace a Failed Drive on a JBOD Sun StorEdge 3510 FC Array That Has Been Failed Over to a Hot Spare Managed by Volume Manager in Solaris 10 (whew!)

Here's the device with the bad disk c1t10d0s0 that was replaced with the hot spare from c1t11d0s0:

# metastat d15
d15: Mirror
Submirror 0: d16
State: Okay
Submirror 1: d17
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 716634624 blocks (341 GB)

d16: Submirror of d15
State: Okay
Hot spare pool: hsp000
Size: 716634624 blocks (341 GB)
Stripe 0: (interlace: 256 blocks)
Device Start Block Dbase State Reloc Hot Spare
c1t4d0s0 20352 Yes Okay Yes
c1t3d0s0 20352 Yes Okay Yes
c1t2d0s0 20352 Yes Okay Yes
c1t1d0s0 20352 Yes Okay Yes
c1t0d0s0 20352 Yes Okay Yes

d17: Submirror of d15
State: Okay
Hot spare pool: hsp000
Size: 716634624 blocks (341 GB)
Stripe 0: (interlace: 256 blocks)
Device Start Block Dbase State Reloc Hot Spare
c1t9d0s0 20352 Yes Okay Yes
c1t8d0s0 20352 Yes Okay Yes
c1t7d0s0 20352 Yes Okay Yes
c1t6d0s0 20352 Yes Okay Yes
c1t10d0s0 20352 No Okay Yes c1t11d0s0

Device Relocation Information:
Device Reloc Device ID
c1t4d0 Yes id1,ssd@n20000011c6968cf9
c1t3d0 Yes id1,ssd@n20000011c6967f16
c1t2d0 Yes id1,ssd@n20000011c6968c7c
c1t1d0 Yes id1,ssd@n20000011c68baaed
c1t0d0 Yes id1,ssd@n20000011c6968ca1
c1t9d0 Yes id1,ssd@n20000011c6967e6e
c1t8d0 Yes id1,ssd@n20000011c68b0388
c1t7d0 Yes id1,ssd@n20000011c68deaaf
c1t6d0 Yes id1,ssd@n20000011c6969259
c1t11d0 Yes id1,ssd@n20000011c68bbb2d

I removed the meta database replicas that were on c1t10d0 but I'm not convinced I had to do that before continuing.

The cfgadm command can show the attachment point for the disk.

# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 fc-private connected configured unknown
c1::22000011c68b0388 disk connected configured unknown
c1::22000011c68b5cb3 disk connected configured unknown
c1::22000011c68baaed disk connected configured unknown
c1::22000011c68bbb2d disk connected configured unknown
c1::22000011c68deaaf disk connected configured unknown
c1::22000011c6967e6e disk connected configured unknown
c1::22000011c6967f16 disk connected configured unknown
c1::22000011c6968c7c disk connected configured unknown
c1::22000011c6968ca1 disk connected configured unknown
c1::22000011c6968cf9 disk connected configured unknown
c1::22000011c6969259 disk connected configured unknown
c1::22000011c696a895 disk connected configured unknown
c1::225000c0ff086290 ESI connected configured unknown
c2 fc-private connected configured unknown
c2::500000e01127c191 disk connected configured unknown
c2::500000e01127c8a1 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb0/4 unknown empty unconfigured ok

However, both the cfgadm and luxadm commands are unable to remove the drive since it's on a fiber loop and is a JBOD array.

# cfgadm -x replace_device c1::22000011c68b5cb3
cfgadm: Configuration operation not supported

# luxadm remove_device 22000011c68b5cb3

WARNING!!! Please ensure that no filesystems are mounted on these device(s).
All data on these devices should have been backed up.

Error: Invalid path. Device is not a SENA subsystem. - 22000011c68b5cb3.

Instead, use luxadm to offline the bad disk:

# luxadm -e offline /dev/rdsk/c1t10d0s2

Then devfsadm to remove the dev entries:

# devfsadm -Cv
devfsadm[3915]: verbose: removing file: /dev/dsk/c1t10d0s0
devfsadm[3915]: verbose: removing file: /dev/dsk/c1t10d0s1
devfsadm[3915]: verbose: removing file: /dev/dsk/c1t10d0s2
devfsadm[3915]: verbose: removing file: /dev/dsk/c1t10d0s3
devfsadm[3915]: verbose: removing file: /dev/dsk/c1t10d0s4
devfsadm[3915]: verbose: removing file: /dev/dsk/c1t10d0s5
devfsadm[3915]: verbose: removing file: /dev/dsk/c1t10d0s6
devfsadm[3915]: verbose: removing file: /dev/dsk/c1t10d0s7
devfsadm[3915]: verbose: removing file: /dev/rdsk/c1t10d0s0
devfsadm[3915]: verbose: removing file: /dev/rdsk/c1t10d0s1
devfsadm[3915]: verbose: removing file: /dev/rdsk/c1t10d0s2
devfsadm[3915]: verbose: removing file: /dev/rdsk/c1t10d0s3
devfsadm[3915]: verbose: removing file: /dev/rdsk/c1t10d0s4
devfsadm[3915]: verbose: removing file: /dev/rdsk/c1t10d0s5
devfsadm[3915]: verbose: removing file: /dev/rdsk/c1t10d0s6
devfsadm[3915]: verbose: removing file: /dev/rdsk/c1t10d0s7

The output from cfgadm now shows the device as unusable:

# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 fc-private connected configured unknown
c1::22000011c68b0388 disk connected configured unknown
c1::22000011c68b5cb3 disk connected configured unusable
c1::22000011c68baaed disk connected configured unknown
c1::22000011c68bbb2d disk connected configured unknown
[...snip...]

Physically replace the device. In the 3510 JBOD array with the default boxid of zero (check the button hidden under the left plastic ear tab), the disk layout looks like this:

0 3 6 9
1 4 7 10
2 5 8 11

(0 to 11 counting down columns first then over rows)

When the disk is replaced, the devfsadm daemon should pick up the disk immediately and configure the dev entries. If not, try this to see what the problem is:

# luxadm -e port
/devices/pci@9,600000/SUNW,qlc@2/fp@0,0:devctl CONNECTED
/devices/pci@8,600000/SUNW,qlc@1/fp@0,0:devctl CONNECTED

Note: If you get a "NOT CONNECTED" error on the 3510 path, check cfgadm to see if the fiber connection is connected.

# cfgadm -al
Ap_Id Type Receptacle Occupant Condition
c0 scsi-bus connected configured unknown
c0::dsk/c0t0d0 CD-ROM connected configured unknown
c1 fc-private connected configured unknown
c1::22000011c68b0388 disk connected configured unknown
c1::22000011c68baaed disk connected configured unknown
c1::22000011c68bbb2d disk connected configured unknown
c1::22000011c68deaaf disk connected configured unknown
c1::22000011c6967e6e disk connected configured unknown
c1::22000011c6967f16 disk connected configured unknown
c1::22000011c6968c7c disk connected configured unknown
c1::22000011c6968ca1 disk connected configured unknown
c1::22000011c6968cf9 disk connected configured unknown
c1::22000011c6969259 disk connected configured unknown
c1::22000011c696a895 disk connected configured unknown
c1::225000c0ff086290 ESI connected configured unknown
c1::500000e014cb0282 disk connected configured unknown
c2 fc-private connected configured unknown
c2::500000e01127c191 disk connected configured unknown
c2::500000e01127c8a1 disk connected configured unknown
usb0/1 unknown empty unconfigured ok
usb0/2 unknown empty unconfigured ok
usb0/3 unknown empty unconfigured ok
usb0/4 unknown empty unconfigured ok

If the controller isn't there or is unconfigured try the following:

# cfgadm -c configure cx

If the drives appear with a condition set to "unusable" do the following using the pathname from the luxadm -e port command above:

# luxadm -e forcelip devices/pci@9,600000/SUNW,qlc@2/fp@0,0:devctl

Once the dev devices for the replaced drive are back in, use format to partition the new drive like the old one used to be. You can use the partition map from the hot spare as a template.

Once the drive is partitioned, add any database replicas that may have been on the original device (I should mention that I forgot to do that, so I'm not 100% sure that works), then do a metareplace to trigger the hot spare to go back to available and the replaced drive to start resyncing:

# metareplace -e d17 c1t10d0s0

Show progress with:

# metastat | grep %

Resync in progress: 73 % done

and see that the hot spare is available again with:

# metahs -i

# metahs -i
hsp000: 2 hot spares
Device Status Length Reloc
c1t11d0s0 Available 143349312 blocks Yes
c1t5d0s0 Available 143349312 blocks Yes

Device Relocation Information:
Device Reloc Device ID
c1t11d0 Yes id1,ssd@n20000011c68bbb2d
c1t5d0 Yes id1,ssd@n20000011c696a895

keywords: 3150 storedge storagetek solaris volume manager hot spare fc fiber channel jbod