TECHNICAL REPORTS



Original Documentation:

US Fish and Wildlife Service

The National Ecology Research Center digitized the Ecoregions of the Continents map (Robert G. Bailey, U.S. Department of Agriculture, Forest Service, Washington, 1989) at a scale of 1:30,000,000 from a paper source. Arc/Info Version 5.01 software was used to digitize the map in table inches using an Altek Model 34.3 tablet with a resolution of .001" (.0254 mm) and an accuracy of +/- .003" (+/- .08 mm).

The following information is extracted from documentation of November 27, 1990 for the "ECOREGIONS OF THE CONTINENTS DATA BASE" distributed with the original version of the dataset by the National Ecology Research Center, U.S. Fish and Wildlife Service, 4512 McMurray Avenue, Ft. Collins, CO 80525-3400:

"The center was unable to determine which [projection] was used.

Since the projection and specific parameters of the source map are unknown, it is not possible to accurately transform the ECOWRLD coverage to projected units. USERS OF THE ECOWRLD COVERAGE (in table inches) ARE ADVISED THAT THE NATIONAL ECOLOGY RESEARCH CENTER ASSUMES NO RESPONSIBILITY FOR A TRANSFORMATION OF THE EXISTING COVERAGE OR FOR RESULTS OBTAINED FROM A RUBBER SHEETING PROCESS.

Rubber sheeting can be applied if a generous number of control points are used. The accuracy of the final product, however, may still be in question depending on the number and accuracy of the links used (Andrew Duff, ESRI, personal communication)."

WCMC Documentation:

Documentation supplied with the WCMC version provides the following information:

Information Content

Digital data obtained from the Ecology Research Center, US Fish and Wildlife Service. The data has been digitized in table inches using an Altek Model 34-3 tablet with a resolution of 0.254 mm and an accuracy of +/- .003 inches (+/-) .08 mm) in ARC/INFO v.5.01. The projection of the map was unknown, although tic locations were provided. The transformation of the tic points was unsuccessful so the data [were] transformed up to World Data[bank] II and then rubber sheeted using over 7000 links. Rubber Sheeting was applied at WCMC. The (polygon) data is presented with a single polygon attribute file for each coverage.

        Description of coverage Bailey [from ARC/INFO]:

        Precision single
ARCS                                    POLYGONS
Arcs            =       1574            Polygons        =       633
Segments        =       40213           Polygon Topology is present
        0 bytes of Arc Attribute Data   186 bytes of Polygon Attribute Data

NODES                                   POINTS
Nodes           =       1599            Label Points    =       632
        0 bytes of Node Attribute Data

        TOLERANCES                              SECONDARY FEATURES
Fuzzy           =       0.006 V Tics            =       288
Dangle  =       0.000 N Links           =       0

        COVERAGE BOUNDARY


Xmin    =       -180.000                Ymin    =       -90.000
Xmax    =        180.000                Ymax    =        90.000



COORDINATE SYSTEM DESCRIPTION

Projection GEOGRAPHIC

Units DD Spheroid CLARK1866

The following table lists the original control points from NERC and those provided in the WCMC reprojected coverage:

IDTIC XTIC(Orig)   YTIC(Orig)        XTIC(WCMC)    YTIC(WCMC)
    1      -180           60      -.1799959E+03  0.5999799E+02 
    2         0           80      0.0000000E+00  0.7999663E+02 
    3        80           80      0.7999514E+02  0.7999663E+02 
    4       180           60      0.1799959E+03  0.5999799E+02 
    5       160           20      0.1599998E+03  0.1999937E+02 
    6       160          -20      0.1599998E+03     -19.999370 
    7       180          -60      0.1799959E+03     -59.997990 
    8       120          -80      0.1199927E+03     -79.996630 
    9       -20          -80         -19.998790     -79.996630 
   10      -180          -60        -179.995900     -59.997990 
   11      -140          -20        -139.999800     -19.999370 
   12      -140           20      -.1399998E+03  0.1999937E+02 
   13       -40           20      -.3999995E+02  0.1999937E+02 
   14        80           20      0.7999991E+02  0.1999937E+02 
   15        80          -20      0.7999991E+02     -19.999370 
   16       -40          -20         -39.999950     -19.999370 



This suggests agreement to about three decimal places, however it is uncertain if the WCMC tic coordinates are those reprojected from original table inches after determining the rubber sheeting parameters, or if they were used as part of the control point array (in which case the agreement could not be taken as a general result for the overall map).

Top of the page


DATA INTEGRATION REPORT

John J. Kineman
National Geophysical Data Center
Boulder, CO 80303

Source Data:

.The source dataset was obtained through Mark Collins of the World Conservation Monitoring Center in U.K. following various unsuccessful attempts to unproject the original digital version provided to us by the National Ecology Research Laboratory (Digitized by Robert Waltermire from Robert Bailey's original map). The original map was thought to conform to an unknown Ginsberg modified Polyconic projection, but we were only able to confirm that it is based on a Russian projection used in the Gerasimov atlas (see Additional References, above). A copy of the Gerasimov atlas resides at NGDC in Boulder, and NGDC has close contacts with the cartographic institute which produced the atlas (part of the former USSR Academy of Sciences in Moscow). These contacts provided the following information about the projection used for Plate 75 of the atlas (which was apparently the base map for Bailey's work):

1. The projection is a modified polyconic projection of the USSR Geodetic and Cartographic Institute (Academy of Sciences).

2. In 1971 the projection was approximated at the institute by computer analysis using 9th order polynomials in latitude and longitude, with control points on a 5 degree grid. Otherwise, there is no known mathematical transformation.

The polynomial approximation techniques developed in Moscow for reprojection are similar to most rubber-sheeting methods (which also use polynomial approximation). Since the WCMC version was already done, and was kindly made available for the project, we decided to use that version.

Data Integration:

The WCMC version of the Bailey's Ecoregions dataset was provided to us from WCMC on floppy disk as an Arc/Info Export file in compressed format, and was exported from PC Arc/Info v.3.4D Plus to Idrisi 4.0 preserving coordinate precision to three decimal places and using region codes created from unique combinations of fields in the attribute table. Thus BEC.VEC is a vector data file with polygon IDs corresponding to codes that were assigned for each of the Bailey Ecoregion classes. The codes were created by numbering the unique occurrences of Domain, Division, and Province in the dataset. The resulting vector polygon file was also rasterized into a 10-minute global image file (BEC.IMG) in the GED format (see User's Guide). The attributes Domain and Division were similarly coded from the Polygon Attribute Table and assigned to the Ecoregions map to produce derived raster layers. Both BECDOM and BECDIV are thus simple reclassifications of BEC. Since the Province attribute is essentially the same as the BEC classes, a separate image/map was not created for it. The descriptive legend for the numerically coded classes in BEC was created by combining the information for Domain, Division, and Province from the attribute table.

The vector polygon file produced for rasterization using the "DLG" procedure is included in the SOURCE directory. This file contains "reverse-digitized" hole polygons.

The vector file included in the main dataset (not the Source directory) was created using the "Ungenerate" command in Arc/Info, however it does not contain the same ID values described above. It is therefore documented in the metadata file (BEC.DVC) as a "line" file, rather than a polygon file. In fact the lines are closed polygons, but not labeled as above. It is provided for visual overlay, whereas the DLG produced file stored in the SOURCE directory should be used for rasterizing or linking to the attribute table. The "Ungenerate" procedure produces a vector polygon map that does not "reverse out" the holes. These files are better for visual overlay (for example in Idrisi) because the connecting lines between parent and polygon are removed; however, they are less robust for use in rasterizing because one must be certain that hole polygons follow sequentially in the file after the parent polygons. If this is not the case, the parent polygon data will "fill-in" the hole during rasterization.

Quality Control and Testing

Various quality checks were performed during and after the integration process. First, visual comparisons were made with Micro World Data Bank II in Arc/Info and Idrisi. Micro World Databank II is the standard georeference adopted by the GED Project, and is considered acceptable to 2 minute resolution (.033 deg.). Distance measurements were made in random areas appearing to have the greatest and/or characteristic disagreement. On this subjective basis, disagreements along the coast approaching 1-degree were found. This, however, does not appear to be an error in the re-projection process, since the MWDBII vectors are far more detailed than the Bailey polygon boundaries and no systematic or regional patterns of disagreement (i.e., consistent displacements in one direction or another over a significant region) were noticeable. Projection errors would be especially evident in the polar regions in such a visual comparison, but again, the general agreement between MWDBII coasts and the BEC coasts seemed consistent. Overall registration differences between the two data sets appeared to be considerably less than the mean difference between the coastlines, again supporting the hypothesis that discrepancies were primarily due to the resolution of the Bailey Ecoregion data, rather than projection or registration errors.

More rigorous statistical comparisons were performed to test the registration and general agreement with MWDBII coastlines.

Accuracy of Coastline compared to Micro World Data Bank II

Mean distance between coastlines between the Bailey dataset and MWDBII Coasts were calculated on a 10-minute raster grid by first producing a distance map from the land/water boundary in the Bailey dataset, then extracting statistics using the MWDBII coastline as the extraction feature. The results of this test were:

Mean coastline offset: .22 deg.
Maximum offset: 3.5 deg.
standard deviation of offset: .32 deg.

This indicates that about 90% of the coastal points are within .5 deg.

Registration

Next, registration was tested by perturbing the origin in the above analysis, so that comparisons were made with a one pixel offset in four orthogonal directions. The results of this test were:

perturbation mean coastal offset

(x+1, y+1) : .25
(x-1, y-1): .24
(x+1,y-1): .25
(x-1,y+1): .25

Mean coastal offset will increase with increasing mis-registration, and as expected increased mis-registration occurred in all four directions. This indicates that no registration error is detectable at 10-minute resolution.

Distance between points along a polygon boundary

As a final check, statistics were produced (using a program developed by Mark Ohrenschall of NGDC) on the point spacing within the Bailey vector data. The results of this test were (numerical values below are in units of degrees):

all points weighted equally means by polygon s.d. by polygon

Mean point spacing: .33 .33 .2
standard deviation: .5 .1 ---
maximum: *8 .9 2.8

* excluding straight lines, which can have point spreads up to 58 degrees.

This indicates about 20 minute resolution at 70% confidence, 30 minute resolution at 80%, and about 50 minute resolution at 99%. Also, the polygon resolutions are evenly distributed (mean = mean of polygon means), and fairly consistent (s.d. of the polygon mean point spacing = .1); although point spacings approaching 1 deg. were common among polygons. The average (across polygons) of standard deviations of spacings within polygons was .2. The maximum s.d. of spacings within a polygon was 2.8.

Conclusion

One must remember that grid resolution, i.e., the spacing or size of raster cells, is not the same as feature resolution. For a raster representation of this dataset that does not loose information, a 10-minute grid was necessary, although the spatial resolution of the data is no better than .5 degree and may approach 1 degree. This is because resolution of vector data varies between features, and is different for relative locations of polygon boundaries than it is for absolute locations or information internal to those boundaries.

A feature resolution of between .3 and .8 deg seems like a good estimate overall. This is supported by the mean point spacing along lines as well as the mean locational error of the coastlines. It is interesting how well this corresponds (assuming a digitizing accuracy of 1 mm) to the mixed scales, 1:30,000,000 for the original Bailey map and USDA Forest Service digitized version, and 1:80,000,000 for the Russian data published in the Gerasimov atlas, which was used as a major source.

For the raster images, representation at 10-minutes was chosen to preserve boundary relationships and relative detail. A general rule of thumb in remote sensing is that ground resolution is generally limited to about 2-3 pixels. The same ratio thus exists for a 10-minute raster version of this dataset (resolvable to only .5 degree information). The difference between representation and resolution is thus apparent and unavoidable.

The comparisons performed to test resolution and registration show that the rubber sheeting process performed by WCMC was successful and probably an order of magnitude more accurate than the data in both projection and registration. Nevertheless, this was tested only along the coastlines (unless the tic points were preserved for testing and not used as control points in the rubber sheeting process - this would have to be confirmed with WCMC). Position error of the coastline was consistent with the point spacing.

Top of the page