Sunday, April 3. 2011
Set APM and AAM feature configuration attributes for disks on Solaris
Unfortunately, there is no "hdparm" utility for Solaris, as there is for GNU/Linux. I had a look at the sources and porting it to Solaris will require a considerable effort. It is not difficult, at least not for setting/getting the various attributes, it is just that "hdparm" was probably never written with portability to other operating systems in mind (no criticism!). A lot of #ifdefs will have to be added.
Then there are the "smartmontools" which are written in a highly modular way, aiming for portability from the very beginning. The smartmontools deal with S.M.A.R.T. data, but adding some options to get/set APM (Advanced Power Management), AAM (Advanced Acoustic Management) and caching attributes would be very easy from a technical and coding standpoint. I am afraid, they just don't belong there. Otherwise someone would have done it long ago.
So I was back at square one today, wanting to mess around with my disks but not having a tool. So, I just wrote something.
Here are setapm.c and a corresponding setapm binary for s10u9 for configuring the APM feature attribute on disks and an almost identical setaam.c and setaam for setting the AAM feature attribute.
They are very minimalistic. They will not even read out the configureation values from the disk for you. You'll only be able to set them.
And this will only work on systems where thesata/ahci sd driver is used, not the legacy ata/ cmdk driver, because for the latter the USCSI ioctl is not implemented.
Then there are the "smartmontools" which are written in a highly modular way, aiming for portability from the very beginning. The smartmontools deal with S.M.A.R.T. data, but adding some options to get/set APM (Advanced Power Management), AAM (Advanced Acoustic Management) and caching attributes would be very easy from a technical and coding standpoint. I am afraid, they just don't belong there. Otherwise someone would have done it long ago.
So I was back at square one today, wanting to mess around with my disks but not having a tool. So, I just wrote something.
Here are setapm.c and a corresponding setapm binary for s10u9 for configuring the APM feature attribute on disks and an almost identical setaam.c and setaam for setting the AAM feature attribute.
They are very minimalistic. They will not even read out the configureation values from the disk for you. You'll only be able to set them.
And this will only work on systems where the
Monday, February 28. 2011
"Enhanced" zpool statistics
The "zpool iostat" command shows some zpool statistics:
The meaning of these figures is not really documented in the man page, but easy to deduce: It is the average number of read and write requests per second and the average amount of data read and written per second, counted from boot (or, to be more correct, zpool import) time.
With an additional -v parameter, we get the same information not only on pool level, but also on device level:
If you want the raw data for this, then you have to help yourself. See the zpstat.c I have just written. Compile instructions inside (very simple!). Usage: zpstat <pool1> [<pool2> ... <pooln>]. For each given pool, it will iterate through the in-core vdev tree structure and for each vdev it will try to extract statistics information. Output looks like this:
You get basically the same information as from "zpool iostat", but before averaging, so you can a) calculate an average I/O size, and b) extract the figures, do something (for a longer while), extract the figures again and get an average figure for exactly this time period.
DISCLAIMER:
a) This code uses private interfaces of Solaris. It is not portable across releases. In fact, you will have to compile seperately for each release.
b) This is not the proper way to seriously gather information for performance-tuning or so. For this purpose, use real weapons. Like "dtrace".
c) I have written this for my own educational purposes (to learn about the libzfs interfaces and nvpairs and ZFS internals in general). I do not claim fitness for any particular purpose...
foo@bar> zpool iostat
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
ei01 123G 109G 1 1 19.7K 18.5K
ei02 46.8G 112G 2 0 335K 92.8K
---------- ----- ----- ----- ----- ----- -----
The meaning of these figures is not really documented in the man page, but easy to deduce: It is the average number of read and write requests per second and the average amount of data read and written per second, counted from boot (or, to be more correct, zpool import) time.
With an additional -v parameter, we get the same information not only on pool level, but also on device level:
foo@bar> zpool iostat -v
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
ei01 123G 109G 1 1 19.7K 18.5K
mirror 123G 109G 1 1 19.7K 18.5K
c0d1 - - 0 0 13.9K 18.5K
c1d0 - - 0 0 13.6K 18.5K
---------- ----- ----- ----- ----- ----- -----
ei02 46.8G 112G 2 0 335K 92.8K
c0d0s7 46.8G 112G 2 0 335K 92.8K
---------- ----- ----- ----- ----- ----- -----
If you want the raw data for this, then you have to help yourself. See the zpstat.c I have just written. Compile instructions inside (very simple!). Usage: zpstat <pool1> [<pool2> ... <pooln>]. For each given pool, it will iterate through the in-core vdev tree structure and for each vdev it will try to extract statistics information. Output looks like this:
ei02
type: 'root'
id: 0
vdev_stats.vs_timestamp: 693868 seconds
vdev_stats.vs_ops[ZIO_TYPE_NULL]: 1
vdev_stats.vs_ops[ZIO_TYPE_READ]: 1835587
vdev_stats.vs_ops[ZIO_TYPE_WRITE]: 646427
vdev_stats.vs_ops[ZIO_TYPE_FREE]: 0
vdev_stats.vs_ops[ZIO_TYPE_CLAIM]: 0
vdev_stats.vs_ops[ZIO_TYPE_IOCTL]: 7185
vdev_stats.vs_bytes[ZIO_TYPE_NULL]: 0
vdev_stats.vs_bytes[ZIO_TYPE_READ]: 238795578368
vdev_stats.vs_bytes[ZIO_TYPE_WRITE]: 65714164736
vdev_stats.vs_bytes[ZIO_TYPE_FREE]: 0
vdev_stats.vs_bytes[ZIO_TYPE_CLAIM]: 0
vdev_stats.vs_bytes[ZIO_TYPE_IOCTL]: 0
vdev_stats.vs_read_errors: 0
vdev_stats.vs_write_errors: 0
vdev_stats.vs_checksum_errors: 0
vdev_stats.vs_self_healed: 0
type: 'disk'
id: 0
path: '/dev/dsk/c0d0s7'
path: 'id1,cmdk@AMaxtor_6L200P0=L41FZEGH/h'
path: '/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0:h'
vdev_stats.vs_timestamp: 693868 seconds
vdev_stats.vs_ops[ZIO_TYPE_NULL]: 1
vdev_stats.vs_ops[ZIO_TYPE_READ]: 1835587
vdev_stats.vs_ops[ZIO_TYPE_WRITE]: 646427
vdev_stats.vs_ops[ZIO_TYPE_FREE]: 0
vdev_stats.vs_ops[ZIO_TYPE_CLAIM]: 0
vdev_stats.vs_ops[ZIO_TYPE_IOCTL]: 7185
vdev_stats.vs_bytes[ZIO_TYPE_NULL]: 0
vdev_stats.vs_bytes[ZIO_TYPE_READ]: 238795578368
vdev_stats.vs_bytes[ZIO_TYPE_WRITE]: 65714164736
vdev_stats.vs_bytes[ZIO_TYPE_FREE]: 0
vdev_stats.vs_bytes[ZIO_TYPE_CLAIM]: 0
vdev_stats.vs_bytes[ZIO_TYPE_IOCTL]: 0
vdev_stats.vs_read_errors: 0
vdev_stats.vs_write_errors: 0
vdev_stats.vs_checksum_errors: 0
vdev_stats.vs_self_healed: 0
You get basically the same information as from "zpool iostat", but before averaging, so you can a) calculate an average I/O size, and b) extract the figures, do something (for a longer while), extract the figures again and get an average figure for exactly this time period.
DISCLAIMER:
a) This code uses private interfaces of Solaris. It is not portable across releases. In fact, you will have to compile seperately for each release.
b) This is not the proper way to seriously gather information for performance-tuning or so. For this purpose, use real weapons. Like "dtrace".
c) I have written this for my own educational purposes (to learn about the libzfs interfaces and nvpairs and ZFS internals in general). I do not claim fitness for any particular purpose...
Thursday, February 17. 2011
Soma Bay (Egypt), 21.12.2008
Checking physical sector size of disks on Solaris
I have just written a small onefifty-liner to check the physical block size of disk devices on Solaris. I have done this to show that the WD20EARS does indeed (erroneously!) report a physical sector size of 512 bytes.
See: blocksize.c (Source) and blocksize (binary, compiled on Solaris 10/x86).
Note that on the x86 platform, this will only work if the disks are attached to the sd driver (because the DKIOCGMEDIAINFOEXT ioctl is implemented only there and not in the cmdk driver). This will be the case if you use an AHCI controller supported by ahci(7d).
In my case (this is on 5.10 Generic_142910-17)
The above WD20EARS is revision "00MVWB0". I have another one at revision "00J2GB0", unfortunately hanging on a controller unsupported by ahci(7d).
If anybody has a WDxxEARS correctly reporting a dki_pbsize = 4096, I would love to hear it.
Edit:
If anybody has a Samsung HD204UI, I would also like to know if this disk correctly reports the physical block size.
See: blocksize.c (Source) and blocksize (binary, compiled on Solaris 10/x86).
Note that on the x86 platform, this will only work if the disks are attached to the sd driver (because the DKIOCGMEDIAINFOEXT ioctl is implemented only there and not in the cmdk driver). This will be the case if you use an AHCI controller supported by ahci(7d).
In my case (this is on 5.10 Generic_142910-17)
root@ilbirs> format -e </dev/null | egrep "[0-9]\."
0. c0t0d0 <DEFAULT cyl 30390 alt 2 hd 255 sec 63>
1. c0t1d0 <ATA-Hitachi HDS72107-A70M-698.64GB>
2. c0t3d0 <ATA-WDC WD20EARS-00M-AB50-1.82TB>
3. c0t4d0 <ATA-Hitachi HDS72107-A70M-698.64GB>
root@ilbirs> ./blocksize /dev/rdsk/c0t3d0
dkmp.dki_capacity = 3907029167
dkmp.dki_lbsize = 512
dkmp_ext.dki_capacity = 3907029167
dkmp_ext.dki_lbsize = 512
dkmp_ext.dki_pbsize = 512
root@ilbirs> ./blocksize /dev/rdsk/c0t4d0
dkmp.dki_capacity = 1465149167
dkmp.dki_lbsize = 512
dkmp_ext.dki_capacity = 1465149167
dkmp_ext.dki_lbsize = 512
dkmp_ext.dki_pbsize = 512
The above WD20EARS is revision "00MVWB0". I have another one at revision "00J2GB0", unfortunately hanging on a controller unsupported by ahci(7d).
If anybody has a WDxxEARS correctly reporting a dki_pbsize = 4096, I would love to hear it.
Edit:
If anybody has a Samsung HD204UI, I would also like to know if this disk correctly reports the physical block size.
Wednesday, February 2. 2011
Broadcom 5721 (bge) on Solaris 10 09/10 (s10u9)
I am in the process of trying out Solaris 10 09/10 (s10u9) on my machines, and encountered problems on my Dell PowerEdge 840 Server. The NIC on that box is a Broadcom 5721.
The NIC does come up, but only receives broadcast and multicast packets, not unicast ones.
As a workaround, I am using the driver from Solaris 10 10/09 (s10u8), which works fine.
Update 02.02.2011:
Problem solved!! I tried to install Solaris 11 Express on this machine and again the NIC wouldn't work. This time I tried a bit harder and found out that I had disabled the PXE Extensions for the NIC in the BIOS. After switching the PXE Extensions back on, it works!
Must be some subtle initialization problem.
The NIC does come up, but only receives broadcast and multicast packets, not unicast ones.
As a workaround, I am using the driver from Solaris 10 10/09 (s10u8), which works fine.
Update 02.02.2011:
Problem solved!! I tried to install Solaris 11 Express on this machine and again the NIC wouldn't work. This time I tried a bit harder and found out that I had disabled the PXE Extensions for the NIC in the BIOS. After switching the PXE Extensions back on, it works!
Must be some subtle initialization problem.
Tuesday, February 1. 2011
Problem compiling http-2.2.17 with gcc-4.5.2 on Solaris x86
Until recently, some nasty pointer arithmetic/type conversion in httpd led to a miscompilation by newer gcc's on x86. See here and here for more information.
This patch is already included in httpd-2.2.17.
Something still seems to be wrong, however. When compiling with "-O2" and gcc-4.5.2, httpd would still fail for me (in https processing, http was doing fine). I tried one of the aforementioned workarounds (using "-O2 -fno-strict-aliasing") and the problem vanished.
This patch is already included in httpd-2.2.17.
Something still seems to be wrong, however. When compiling with "-O2" and gcc-4.5.2, httpd would still fail for me (in https processing, http was doing fine). I tried one of the aforementioned workarounds (using "-O2 -fno-strict-aliasing") and the problem vanished.
Monday, January 31. 2011
Patch for "openssl" on Solaris x86
Whenever I build openssl from source (like today for openssl-1.0.0c) I am annoyed by the problem described in section E.3 of this document.
The recommended workaround is to build custom versions of the values-X[act].o objects, so that gcc links every program against these instead of their original versions in /usr/lib. They contain an (admittedly nice) hack to jump over the zeroes (instead of NOPs) inserted by the SUN linker. I didn't like that. A broad workaround for an isolated problem... In fact, there are voices saying that this behaviour of Solaris ld is a feature, not a bug.
I prefer to patch openssl, like this:
I.e., simply remove the alignment at the end of the custom .init section in the assembler files (which are generated via a preprocessing script written in perl).
The recommended workaround is to build custom versions of the values-X[act].o objects, so that gcc links every program against these instead of their original versions in /usr/lib. They contain an (admittedly nice) hack to jump over the zeroes (instead of NOPs) inserted by the SUN linker. I didn't like that. A broad workaround for an isolated problem... In fact, there are voices saying that this behaviour of Solaris ld is a feature, not a bug.
I prefer to patch openssl, like this:
diff -ur openssl-1.0.0c~/crypto/perlasm/x86gas.pl openssl-1.0.0c/crypto/perlasm/x86gas.pl
--- openssl-1.0.0c~/crypto/perlasm/x86gas.pl 2008-12-17 20:56:47.000000000 +0100
+++ openssl-1.0.0c/crypto/perlasm/x86gas.pl 2011-01-31 17:22:20.551210508 +0100
@@ -211,7 +211,6 @@
.section .init
call $f
jmp .Linitalign
-.align $align
.Linitalign:
___
}
I.e., simply remove the alignment at the end of the custom .init section in the assembler files (which are generated via a preprocessing script written in perl).
Tuesday, January 25. 2011
Patch for "ffmpeg" on Solaris
For a long time I had trouble building and running newer versions of ffmpeg on Solaris. It would throw errors at me like:
The processing of the command line parameters seemed to be botched. The following patch fixes the problem for me:
[NULL @ 8be5140] [Eval @ 8046b9c] Invalid chars 'x01000004' at the end of expression '0x01000004'
[NULL @ 8be5140] Unable to parse option value "0x01000004"
The processing of the command line parameters seemed to be botched. The following patch fixes the problem for me:
--- libavutil/opt.c~ 2010-09-29 23:42:03.000000000 +0200
+++ libavutil/opt.c 2010-10-15 17:42:25.051174575 +0200
@@ -240,7 +240,7 @@
if (o_out) *o_out= o;
switch (o->type) {
- case FF_OPT_TYPE_FLAGS: snprintf(buf, buf_len, "0x%08X",*(int *)dst);break;
+ case FF_OPT_TYPE_FLAGS: snprintf(buf, buf_len, "%u", *(unsigned*)dst);break;
case FF_OPT_TYPE_INT: snprintf(buf, buf_len, "%d" , *(int *)dst);break;
case FF_OPT_TYPE_INT64: snprintf(buf, buf_len, "%"PRId64, *(int64_t*)dst);break;
case FF_OPT_TYPE_FLOAT: snprintf(buf, buf_len, "%f" , *(float *)dst);break;
Tuesday, January 11. 2011
Makadi Bay (Egypt), 14.12.2010
Modified zpool program for newer Solaris versions
A few months ago I published a modified zpool program to create pools with a higher ASHIFT value, suitable for disks with 4k sectors. I expected the program compiled on s10u8 to be upward compatible. Apparently, it isn't.
To compile "zpool" on s10u9, I used the sources from OpenSolaris b138. The instructions how to compile are mostly unchanged from s10u8. The only required additional step is to compile and link against /usr/src/cmd/stat/common/timestamp.c. I am omitting further details.
For Solaris 11 Express, I used b147 as the basis. I was unable to find a complete on-src.tar.bz2 for this release, but had to pull the sources from hg.openindiana.org via hg (mercurial). I then compiled with:
The modification to zpool_vdev.c is always the same (at line 474/475):
For your convenience, I offer pre-compiled versions of the modified zpool program for s10u8, s10u9 and Solaris 11 Express.
To compile "zpool" on s10u9, I used the sources from OpenSolaris b138. The instructions how to compile are mostly unchanged from s10u8. The only required additional step is to compile and link against /usr/src/cmd/stat/common/timestamp.c. I am omitting further details.
For Solaris 11 Express, I used b147 as the basis. I was unable to find a complete on-src.tar.bz2 for this release, but had to pull the sources from hg.openindiana.org via hg (mercurial). I then compiled with:
cd /tmp
mkdir -p usr/src
cd ~ftp/pub/OpenSolaris/hg.openindiana.org/onnv-gate/usr/src
find \
cmd/stat/common \
common/zfs \
cmd/zpool \
lib/libuutil/common \
lib/libdiskmgt/common \
| cpio -pmdv /tmp/usr/src
cd /tmp/usr/src/cmd/stat/common
gcc -O2 -DTEXT_DOMAIN='"en_US"' -c timestamp.c
cd /tmp/usr/src/cmd/zpool
ln -s /usr/lib/libuutil.so.1 libuutil.so
gcc -O2 \
-Dzpool_rewind_policy_t=zpool_load_policy_t \
-DZPOOL_REWIND_REQUEST=ZPOOL_LOAD_REWIND \
-DZPOOL_REWIND_POLICY=ZPOOL_LOAD_POLICY \
-DZPOOL_REWIND_REQUEST_TXG=ZPOOL_LOAD_REWIND_TXG \
-DZPOOL_NO_REWIND=ZPOOL_NORMAL_LOAD \
-DTEXT_DOMAIN='"en_US"' \
-I/tmp/usr/src/cmd/stat/common \
-I/tmp/usr/src/common/zfs \
-I/tmp/usr/src/lib/libuutil/common \
-I/tmp/usr/src/lib/libdiskmgt/common \
-c *.c
gcc -o zpool *.o ../stat/common/timestamp.o \
-L. \
-lzfs \
-lnvpair \
-ldevid \
-lefi \
-ldiskmgt \
-luutil \
-lumem \
-L/lib -lcryptoutil
The modification to zpool_vdev.c is always the same (at line 474/475):
--- zpool_vdev.c.orig 2011-01-11 16:01:14.906720955 +0100
+++ zpool_vdev.c 2011-01-11 16:39:00.334827548 +0100
@@ -471,6 +471,7 @@
verify(nvlist_add_string(vdev, ZPOOL_CONFIG_PATH, path) == 0);
verify(nvlist_add_string(vdev, ZPOOL_CONFIG_TYPE, type) == 0);
verify(nvlist_add_uint64(vdev, ZPOOL_CONFIG_IS_LOG, is_log) == 0);
+ verify(nvlist_add_uint64(vdev, ZPOOL_CONFIG_ASHIFT, 12) == 0);
if (strcmp(type, VDEV_TYPE_DISK) == 0)
verify(nvlist_add_uint64(vdev, ZPOOL_CONFIG_WHOLE_DISK,
(uint64_t)wholedisk) == 0);
For your convenience, I offer pre-compiled versions of the modified zpool program for s10u8, s10u9 and Solaris 11 Express.
Sunday, October 3. 2010
Building "unrar" on Solaris x86
When building "unrar" from Source do not forget to set "-DALLOW_NOT_ALIGNED_INT". This provides a huge performance benefit for encrypted files.
I use:
make -f makefile.unix CXX=g++ CXXFLAGS="-O2 -march=pentium3" DEFINES="-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DLITTLE_ENDIAN -DALLOW_NOT_ALIGNED_INT" STRIP=/bin/true DESTDIR=/usr/local
I use:
make -f makefile.unix CXX=g++ CXXFLAGS="-O2 -march=pentium3" DEFINES="-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DLITTLE_ENDIAN -DALLOW_NOT_ALIGNED_INT" STRIP=/bin/true DESTDIR=/usr/local
Sunday, September 19. 2010
Solaris and the new 4K-Sector-Disks (e.g. WDxxEARS) / Part 4
Solaris 10 10/09 09/10 (a.k.a. s10u9) has arrived, and so have my two WD20EARS disks. Unfortunately, the situation regarding the ashift value is pretty much unchanged from a users point of view.
The latest Solaris release does seem to contain the changes described in PSARC 2008/769, however it still creates zpools on a WD20EARS drive (with 4K physical sectors) with an ashift value of 9.
I am not sure what the reason for this is. Most likely the WD drives do not tell the truth even about their physical sector size. What a pity. So, my workaround (the modified zpool program) is still necessary.It works on s10u9 without changes.
There is one more issue with these disks. Allegedly they unload their heads after an idle period of only 8 seconds. Many users have reported very high head load/unload counts as reported in S.M.A.R.T. data. As a precaution, I have put
into /etc/system. The default in s10u9 is still 30 seconds, although the latest opensolaris code (as of mid-August, when Oracle stopped their putbacks) also sets it to 5 seconds. This should keep the disk busy in case of sporadic updates
The latest Solaris release does seem to contain the changes described in PSARC 2008/769, however it still creates zpools on a WD20EARS drive (with 4K physical sectors) with an ashift value of 9.
I am not sure what the reason for this is. Most likely the WD drives do not tell the truth even about their physical sector size. What a pity. So, my workaround (the modified zpool program) is still necessary.
There is one more issue with these disks. Allegedly they unload their heads after an idle period of only 8 seconds. Many users have reported very high head load/unload counts as reported in S.M.A.R.T. data. As a precaution, I have put
set zfs:zfs_txg_timeout = 5
into /etc/system. The default in s10u9 is still 30 seconds, although the latest opensolaris code (as of mid-August, when Oracle stopped their putbacks) also sets it to 5 seconds. This should keep the disk busy in case of sporadic updates
« previous page
(Page 2 of 3, totaling 37 entries)
next page »