emurangeupdate

The tool used to list, and optionally to set, suitable range buckets is called emurangeupdate. It is found on the EMu server. To use this facility it is necessary to log in to the EMu server as user emu. The tool is used to perform a number of activities:

  • print out suitable range buckets for examination
  • install EMu Registry entries to be used when updating the indexes
  • print a table of data distribution allowing manual configuration

The emurangeupdate usage message is:

Usage: emurangeupdate [-dip] [-qrv] [-mmin:max] [-nrecords] [[dbname][:column] ...]

where:

-d use distribution based ranges [default]
-i use interval based ranges
-mmin:max minimum and maximum number of buckets to use [6:39]
-nrecords records per bucket for distribution ranges [5000]
-p use partition based ranges
-q quiet mode, do not output progress
-r update range Registry entries
-v output data distribution table

It is possible to analyse anything from:

  • a single column in one table
  • to all columns in a table
  • to a single column in all tables
  • to all columns in all tables

The format used to specify a column for analysis is dbname:column, where dbname is the name of the table to be analysed and column is the name of the column. The following combinations are allowed:

  • dbname - all columns in the table dbname will be analysed
  • :column - column column in all tables will be analysed
  • dbname:column - column column in table dbname will be analysed

Any number of the above entries may be supplied to emurangeupdate. If an entry is not given, all columns in all tables are examined.

The default action is for emurangeupdate to print out suitable range buckets after examining the data. The following is typical output after analysing the Date Modified field in the Parties table:

 emurangeupdate eparties:AdmDateModified
Processing eparties...
        Determining range columns...
        Checking registry entries...
        Exporting range data...
        Processing AdmDateModified...
                Range Buckets (distribution)
                =============
                2000-11-22
                2001-5-10
                2003-7-22
                2003-8-6
                2005-10-3
                2006-2-21

As you can see the output contains recommended range bucket values. These values could be used with the Range Buckets Registry entry to set the range buckets for the Date Modified field. The required Registry entry would be:

System|Setting|Table|eparties|Range Buckets|AdmDateModified|2000‑11‑22;2001‑05‑10;2003‑07‑22;2003‑08‑06;2005‑10‑03;2006‑02‑21

In fact it is possible to have emurangeupdate add the Registry entry for you by specifying the -r option on the command line:

 emurangeupdate -r eparties:AdmDateModified
Processing eparties...
        Determining range columns...
        Checking registry entries...
        Exporting range data...
        Processing AdmDateModified...
                Range Buckets (distribution)
                =============
                2000-11-22
                2001-5-10
                2003-7-22
                2003-8-6
                2005-10-3
                2006-2-21
        Registry entry updated...

If you want to perform some analysis of the data, you can use the -v option to have a data distribution table printed:

 emurangeupdate -v eparties:AdmDateModified
Processing eparties...
        Determining range columns...
        Checking registry entries...
        Exporting range data...
        Processing AdmDateModified...
                Value                  Count
                =====                  =====
                2000:11:22              1507
                2001:5:10                  1
                2003:7:22                  2
                2003:8:6                   1
                2003:8:26                  1
                2003:9:4                   2
		    ...
                2007:12:27                 1
                2008:1:3                   1
                Distinct                  74
                Total                   2328
                Range Buckets (distribution)
                =============
                2000-11-22
                2001-5-10
                2003-7-22
                2003-8-6
                2005-10-3
                2006-2-21

The Value column contains a sorted list of all values from the Date Modified field. The Count column indicates the number of occurrences of the value. At the end of the table the Distinct value provides the number of unique values and Total is the total number of values (including repeated values). With this information it is possible to perform some analysis (MS Excel may come in handy here!) and determine suitable range buckets.