You are on page 1of 9

Using DAOS Estimator The DAOS Estimator (daosest.

exe) is a tool for planning the roll out of DAOS on the Domino 8.5 server. The tool iterates through all the requested databases scanning for documents with attachments. It keeps a list of all the attachments so that it can estimate savings based on duplicate attachments found within the database as well as across all databases. Install: To install the DAOS Estimator depends on the platform of your Domino server. It begins with downloading daosest for the correct platform. Then you need to put a copy of it in the Domino executable directory. The DAOS estimator is built as an SDK application so it may be run against any version of the Domino server. Next make sure that the permissions are set correctly. Run: You can either run the DAOS Estimator with the server up or down. To run it on the server you can type 'lo daosest <parameters>' or to run it with the server down go to a command prompt and cd to the Domino directory. Then type daosest -h to the get help screen as shown here: IBM DAOS Savings Estimator tool, Version 1.0 Copyright (c) IBM 2008. All rights reserved. daosest <directory or filename> [OPTIONS] -h display this message -o <filename> output to file -v verbose, displays file information Note: Default input path is data directory. Note: See bottom of document for new features added in v1.4 The default is to run against the data directory as defined in the notes.ini. You may also run it against a sub directory or against individual nsf files. The verbose flag will output individual file information to the console or output file. Be aware this generates a lot of information, most of which is unnecessary. You can send the output to a file using the -o option. The advantage of using this feature versus piping the output to a file is that information is still sent to the console to let the user know what database is currently being analyzed. Also the file output is slightly wider, allowing for better readability.

Output: The first section displays per database information. Database Name Orig NSF New NSF Num DAOS Dup Compr Space DAOS Ob Size Size Files Files Files Size Savings Size ============= ======== ======== ====== ====== ====== ======== ======== ========

Database Name file name of the database. Orig NSF Size current size of the database on disk. New NSF Size estimated size of the database with attachments removed. Num Files total number of attachments found in the database. DAOS Files total number of attachments that are DAOS eligible. Dup Files total duplicate files found in database. Compr Size total compressed size of all attachments in database. Space Savings total space savings from database which is the total size of all duplicate attachments. DAOS Ob Size total size of all attachments in the database excluding duplicates. Note that all values are rounded to Kilobytes. The Orig NSF Size should be approximately equal to the DAOS Obj size plus the Space Savings plus the New NSF Size. Example Output: IBM DAOS Savings Estimator tool, Version 1.0 Copyright (c) IBM 2008. All rights reserved. Database Name Orig NSF Ob Size ============= ======== ======== l\k######.nsf 3.1 GB GB \k#######.nsf 4.4 GB GB \k#######.nsf 136.5 MB MB mail\k###.nsf 1.3 GB MB mail\k###.nsf 550.0 MB MB il\k#####.nsf 660.8 MB MB l\l######.nsf 2.5 GB MB ail\l####.nsf 2.3 GB MB \l#######.nsf 2.6 GB MB il\l#####.nsf 171.0 MB MB l\m######.nsf 1.5 GB MB \m#######.nsf 7.9 GB GB \m#######.nsf 179.5 MB MB ail\m####.nsf 414.0 MB MB il\m#####.nsf 3.1 GB GB New NSF Num DAOS Dup Compr Space DAOS

Size Files Files Files Size Savings Size ======== ====== ====== ====== ======== ======== 989.8 MB 1.0 GB 124.9 MB 330.2 MB 447.7 MB 369.0 MB 1.3 GB 1.3 GB 2.0 GB 148.5 MB 528.2 MB 2.3 GB 140.6 MB 211.6 MB 763.7 MB 6473 5083 87 2437 824 1097 6096 2839 1592 57 2424 17736 246 221 7749 6473 5083 87 2437 824 1097 6096 2839 1592 57 2424 17736 246 221 7749 2203 1492 12 2.2 GB 424.7 MB 3.4 GB 879.6 MB 11.6 MB 6.7 MB 1.7 2.5 4.9

719 987.8 MB 301.4 MB 686.4 210 102.3 MB 250 291.8 MB 2391 18.6 MB 83.8

32.8 MB 259.0

1.2 GB 415.2 MB 781.7

877 976.3 MB 172.4 MB 804.0 417 672.1 MB 159.4 MB 512.6 5 848 5266 102 22.5 MB 1.3 MB 21.2

1.0 GB 196.4 MB 848.1 5.7 GB 38.9 MB 1.0 GB 12.9 MB 4.6 26.0

41 202.4 MB 2845

35.4 MB 167.0 1.8

2.3 GB 527.8 MB

\m#######.nsf MB l\m######.nsf MB \M#######.nsf GB \m#######.nsf MB ail\m####.nsf GB \m#######.nsf KB l\M######.nsf MB \m#######.nsf MB il\m#####.nsf GB il\p#####.nsf GB il\p#####.nsf GB l\p######.nsf GB o########.nsf KB \p#######.nsf MB \r#######.nsf MB l\r######.nsf MB l\r######.nsf MB ail\r####.nsf GB \r#######.nsf GB il\r#####.nsf MB \r#######.nsf GB mail\r###.nsf MB \r#######.nsf GB il\r#####.nsf GB \s#######.nsf GB \s#######.nsf GB l\s######.nsf GB l\s######.nsf GB

1.3 GB 520.8 MB 1.4 GB 793.1 MB 3.6 GB 1.5 GB

3919 2678 8231 849 11822 16 505 2180 15220 5235 12173 4894 45 922 1437 3462 891 79577 14850 3956 11858 146 99898 1902 6299 7594 5699 3953

3919 2678 8231 849 11822 16 505 2180 15220 5235 12173 4894 45 922 1437 3462 891 79577 14850 3956 11858 146 99898 1902 6299 7594 5699 3953

1616 792.5 MB 252.9 MB 539.6 760 662.4 MB 157.0 MB 505.5 2493 283 4399 9 2.1 GB 374.9 MB 37.5 MB 3.8 MB 1.7 33.7 2.4 0.4

375.5 MB 338.0 MB 6.6 GB 3.3 GB

3.3 GB 974.0 MB 2.1 MB 1.6 MB

116.5 MB 114.4 MB 902.0 MB 663.9 MB 1.5 GB 514.2 MB 9.9 GB 3.1 GB

90 238.1 MB 558 4370 1875 3303 1333 0

47.3 MB 190.8

1.0 GB 124.9 MB 905.6 6.8 GB 1.4 GB 5.3 1.5 1.7 2.7 0.3

2.8 GB 718.7 MB 3.9 GB 1.6 GB

2.1 GB 548.3 MB 2.2 GB 508.8 MB 3.4 GB 685.4 MB 0.3 KB 0.0 KB

4.1 GB 718.8 MB 2.1 MB 1.9 MB

1.1 GB 862.8 MB 1.5 GB 862.4 MB 2.1 GB 1.1 GB

336 281.0 MB

59.9 MB 221.0

241 660.1 MB 195.7 MB 464.3 918 1.0 GB 226.6 MB 805.5 43.7 MB 266.8 2.1 GB 1.1 GB 4.8 1.8

515.3 MB 204.8 MB 8.8 GB 4.8 GB 2.0 GB 1.9 GB

266 310.4 MB 20768 6602 6.9 GB 2.9 GB

1.8 GB 903.4 MB 5.4 GB 2.5 GB

848 936.4 MB 128.1 MB 808.3 3114 12 30967 649 2955 3305 1840 1221 2.9 GB 791.9 MB 20.2 MB 15.2 GB 1.9 MB 4.7 GB 2.1 18.3 10.6 1.4 1.5 1.1 1.7 2.1

263.8 MB 243.5 MB 17.8 GB 2.6 GB

2.6 GB 844.5 MB 3.2 GB 1004.2 M 3.2 GB 3.9 GB 3.8 GB 1.6 GB 1.7 GB 1.3 GB

1.8 GB 444.2 MB 2.2 GB 681.7 MB 1.6 GB 532.4 MB 2.2 GB 527.5 MB 2.6 GB 476.7 MB

l\s######.nsf 8.4 GB 2.7 GB GB ail\s####.nsf 3.8 GB 1.7 GB GB il\t#####.nsf 672.3 MB 219.1 MB MB l\t######.nsf 4.0 GB 3.2 GB MB mail\t###.nsf 2.6 GB 1.0 GB GB \t#######.nsf 2.4 GB 1.9 GB MB il\t#####.nsf 6.1 GB 3.2 GB GB \t#######.nsf 6.5 GB 2.6 GB GB il\t#####.nsf 3.5 GB 1.4 GB GB il\t#####.nsf 2.1 GB 559.0 MB GB l\t######.nsf 4.8 GB 1.0 GB GB ail\w####.nsf 2.6 GB 383.8 MB GB \weis####.nsf 2.6 GB 567.5 MB GB il\w#####.nsf 2.8 GB 514.5 MB GB il\w#####.nsf 5.8 GB 661.1 MB GB mail\y###.nsf 1.4 GB 780.5 MB MB ail\z####.nsf 11.8 MB 9.7 MB MB For the first database: l\k######.nsf 3.1 GB 989.8 MB GB k######.nsf is 3.1 GB on disk.

1687 6453 1756 1095 3306 3544 9243 8155 1999 4692 3123 2377 2277 3956 7569 3452 8

1687 6453 1756 1095 3306 3544 9243 8155 1999 4692 3123 2377 2277 3956 7569 3452 8

297 1747

5.7 GB

1.8 GB

3.9 1.7

2.1 GB 388.1 MB

670 453.1 MB 133.5 MB 319.6 242 764.4 MB 177.4 MB 587.0 1369 1.6 GB 556.5 MB 1.1

1497 480.5 MB 134.9 MB 345.6 3131 2801 611 1829 832 894 710 1261 2423 2.9 GB 603.0 MB 3.9 GB 892.3 MB 2.0 GB 374.6 MB 1.5 GB 282.7 MB 3.8 GB 434.4 MB 2.3 GB 344.2 MB 2.0 GB 512.9 MB 2.3 GB 404.7 MB 5.2 GB 1.4 GB 2.3 3.0 1.6 1.3 3.4 1.9 1.5 1.9 3.8

1473 676.5 MB 162.5 MB 514.0 0 2.0 MB 0.0 KB 2.0

6473

6473

2203

2.2 GB 424.7 MB

1.7

The approximate size after DAOS is enabled would be 989.8 MB. There are 6473 attachments in the database, all of which are eligible for DAOS. There are 2203 duplicate files representing 424.7 MB. The amount of disk space needed for the DAOS attachments is 1.7 GB. And 1.7 GB + (424.7 MB + 989.8 MB)/1024 is approximately 3.1 GB. The difference being the rounding that takes place in converting everything to KB and the rounding for display. Each database that the tool ran against is displayed. Remember, at this point, the DAOS savings is only due to duplicate attachments within the individual database, not across databases.

The next section in the output is the Summary. It contains information across all databases against which the estimator was run. Example: Summary: Total DB's analyzed: Total DB's skipped due to errors: Total Size of NSF's Examined: Total Attachments found: Total Duplicate Attachments found: Total DAOS Eligible Attachments: Estimated Size of DAOSified NSF's: Estimate Size of DAOS dir: Total Disk Savings: Compression Statistics: None: Huffman: LZ1: Huffman on LZ1 servers:

60 0 188.2 429864 194499 429864 67.5 90.8 38.8 257877 150278 21704 0

GB

GB GB GB

For the above, a total of 60 databases were analyzed. Of those 60 databases, all were able to be opened and analyzed. This number is important to look at to determine how accurate the results are. For example, if half the databases could not be analyzed, then the results would potentially be way off. The total attachments found is a total number across all databases, including duplicates. There were 194,499 duplicates found across all the databases. The Estimated size of the DAOSified NSFs is the estimated size of all the databases that the tool was run against. The Estimated size of the DAOS dir is the estimated size of all the DAOS eligible attachments found excluding duplicates. The Total Disk Savings is the total amount of Disk Space save by eliminating duplicates across all databases. The compression stats are provided for informational purposes only. Histogram: The histogram is provided to give a quick graphical representation of the distribution of the attachment sizes across all the databases. ============================================================================= = | Size Distribution of All Attachments Found | ============================================================================= = | |161416| | | |161416| |

| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |66362 | |66362 | |66362 | |66362 | |66362 | |66362 | |66362 | |66362 | |66362 | |66362 | |66362 | |66362 | |66362 |

|161416| |161416| |161416| |161416| |161416| |161416| |161416| |161416| |161416| |161416| |161416| |161416| |161416|90946 | |161416|90946 | |161416|90946 | | | | | | | | | | |161416|90946 | |161416|90946 | |161416|90946 | |161416|90946 | |161416|90946 | |161416|90946 | |33629 |35311 |161416|90946 | |33629 |35311 |161416|90946 | |33629 |35311 |161416|90946 |

|19744 |33629 |35311 |161416|90946 |18550 | |19744 |33629 |35311 |161416|90946 |18550 | |19744 |33629 |35311 |161416|90946 |18550 | |19744 |33629 |35311 |161416|90946 |18550 | 3458 | 426 | 22 | 0

| | | | | | | | | | | | ============================================================================= = | 0.0% | 0.1% | 0.3% | 0.6% | 7.5% |20.4% |31.5% |25.1% |11.4% | 3.2% | 0.0% | ============================================================================= = | 4k | 8k | 16k | 32k | 64k | 1MB | 5MB | 20MB | 100MB| 1GB | >1GB | ============================================================================= = - Histogram shows the number of attachments contained in each bucket. - Percentages are the percent of total disk space of all attachments per bucket. The numbers and height of the columns represent the number of attachments that fall into that bucket. So for the above data, there are 66,362 attachments that are between 0 and 4KB in size. The percentage below that column represents the percentage of disk space that those attachments are utilizing. Using this data can help you decide what the best DAOS minimum size would be for your environment. The idea would be to maximize the disk space savings while minimizing the number of files. The final section just reiterates the relationship between the Minimum DAOS size and its effect on the number of files and amount of disk space utilized. This will help to determine the optimum DAOS minimum size in your environment.

DAOS Minimum Size versus number of NLO's and Disk Space: 0.0 4.0 8.0 16.0 32.0 64.0 1.0 5.0 20.0 100.0 KB KB KB KB KB KB MB MB MB MB will will will will will will will will will will result result result result result result result result result result in in in in in in in in in in 429864 363502 343758 310129 274818 113402 22456 3906 448 22 .nlo .nlo .nlo .nlo .nlo .nlo .nlo .nlo .nlo .nlo files files files files files files files files files files using using using using using using using using using using 120.7 120.6 120.5 120.2 119.4 110.4 85.8 47.9 17.6 3.8 GB GB GB GB GB GB GB GB GB GB

So for this server the optimum DAOS minimum size may be 64KB because it would include 99% of the disk space occupied by attachments (110.4 GB) while eliminating the need for 316,462 NLO files. DAOS Estimator Options -i <filename> The -i switch takes a filename which contains a list of databases to be analyzed. The file may contain absolute path's as well as path's relative to the data directory. Each name must be followed by a carriage return. For example create a file called files.ind that contains the location of databases:

/local/notesdata/mail/UserA.nsf /local/notesdata/mail/UserB.nsf /local/notesdata/mail/UserC.nsf /local/notesdata/mail/UserD.nsf -c The -c switch causes the DAOS Estimator to write the attachment data out to a delimited text file to be analyzed later with the -a switch. Note that all information but duplicate attachment data is calculated and displayed in thus mode. Using this switch reduces the time to run on the server by as much as 65%. Results will vary. -a <filename> There are two ways to use the -a switch. 1.The -a switch takes the filename of a .csv file containing the attachment data which was generated from a previous run using the -c switch. 2.When a filename with the extension '.ind' is passed in, the DAOS Estimator assumes that the file contains a list of .csv files to be processed. This allows one to divide the databases into several smaller runs using the -i -c switches and then process them all together to get results across the whole set of databases. Also note that each file name must be followed by a carriage return. -p <percent> Estimate the estimate mode. The -p switch takes a percent value between 1 and 99 and uses this value to determine whether to run on each database or not. The default value is 50%. The DAOS Estimator will then run over all the databases specified analyzing a percentage of them. Then using the results of the run, it extrapolates the data out to the full set of databases. This mode is meant to speed up the DAOS Estimator in order to obtain an estimate in a much faster manner. This is useful for large data sets. How to select the Minimum size of object before Domino will store in DAOS using the DAOS Estimator output a. When considering the minimum participation size it is necessary to know the block size of the file system. Here is how to determine the block size:

Platform Command Block Size reported as Window NTFS fsutil fsinfo ntfsinfo Bytes Per Cluster Solaris df -g Block Size AIX (need to be super user) lsfs -q Block Size Linux (need to be super user) df -k (determine device name) dumpe2fs <device name> | grep 'Block Size' Block Size

About block size: The smaller the block size the less waste. Since it is unlikely that all the NLO files will be exact multiples of the block size, there will be some waste. To reduce waste, the minimum participation size needs to be a multiple of the file system block size. It should also be noted that smaller block sizes, which are beneficial for NLO files, are not beneficial for NSF files. With NSF files, the larger the block size the better the disk performance because NSF files are larger than NLO files. Consider creating a separate file system for the NLO files.

b. Selecting the minimum participation size using the "DAOS Minimum Size versus number of NLOs and Disk Space" . The daosest reports a section with the number of NLO files and total NLO disk space that would be generated given minimum participation sizes of 0, 64KB, 128KB, 256KB, 512KB, 1MB, 2MB, 3MB, 4MB and 8MB. Here is an example of the section: 0.0 64.0 128.0 256.0 512.0 1.0 2.0 3.0 4.0 8.0 KB will result in KB will result in KB will result in KB will result in KB will result in MB will result in MB will result in MB will result in MB will result in MB will result in 2226347 .nlo files using 1092894 .nlo files using 708403 .nlo files using 422087 .nlo files using 219833 .nlo files using 93628 .nlo files using 36576 .nlo files using 17499 .nlo files using 9717 .nlo files using 1576 .nlo files using 185.5 GB 175.7 GB 163.6 GB 145.9 GB 120.2 GB 87.8 GB 56.6 GB 38.0 GB 26.3 GB 6.5 GB

The theoretical maximum would generate approximately 2.2MB files using 185GB. Reviewing the information, a value in the range of 128KB-256KB as the minimum participation size would be recommended. Between 128KB and 256KB, there should be approximately 500KB of NLO files, which would take up about 150GB of space. The result is a little less than a quarter of the number of NLO files that the theoretical maximum would require and the disk space would be 80% of the maximum total size. At 80% of the theoretical maximum benefit, with only 25% of the files there would be two additional benefits: 1) the disk backup would perform better and 2) the DAOS resync operation would be faster. DAOS resync has to enumerate all of the NLO files in the system as one of the steps, and the fewer files there are, the faster that part will run. Another consideration is the filesystem blocksize. Assuming a block size of 8K, with a random assortment of file sizes, on average, there will be waste at the rate of half a block size per file. At 64KB, there will be about 1MB files. The wasted space then works out to about 4GB (4KB * 1MB) assuming an 8KB blocksize. At 256KB, the wasted space would be approximately 1.7GB (4KB * 422KB). Again, fewer files is better because there is less wasted space. Lastly, if the yield is not as good as expected, it is much easier to tune the minimum participation size smaller than it is to tune it the other way (and clean up) if it yields too many NLO files.

You might also like