| .names file created by George John, October 1994 | |1.TITLE | SATELLITE IMAGE DATASET (STATLOG VERSION) | | PURPOSE | The StatLog database consists of the multi-spectral values | of pixels in 3x3 neighbourhoods in a satellite image, | and the classification associated with the central pixel | in each neighbourhood. The aim is to predict this | classification, given the multi-spectral values. In | the sample database, the class of a pixel is coded as | a number. | | |2.USE in STATLOG | 2.1 Testing Mode | TRAIN/TEST. | 2.2 Special PreProcessing | None | 2.3 Test Results | Success Rate TIME | Algorithm Train Test Train Test | -------------------------------------------- | KNN 91.1 90.600 2105 944 | LVQ ? 89.500 | Dipol92 ? 88.900 | Radial 88.9 87.900 723 74 | Alloc80 96.4 86.800 63840 28757 | IndCart 98.9 86.200 2109 9 | Cart 92.1 86.200 348 14 | BackProp 88.8 86.100 54371 39 | BayTree ? 85.30 | NewId 93.3 85.000 296 53 | Cn2 98.6 85.000 1718 16 | C4.5 95.7 85.000 449 11 | Cal5 87.8 84.900 1345 13 | QuaDisc 89.4 84.500 276 93 | Ac2 ? 84.300 8244 17403 | Smart 87.7 84.100 83068 20 | LogDisc 88.1 83.700 4414 41 | Cascade ? 83.700 | Discrim 85.1 82.900 68 12 | Kohonen 89.9 82.100 12627 129 | Castle ? 80.600 | Bayes 71.3 71.300 56 12 | Default 24.00 24.000 | Itrule ? 0.000 253 253 | |3. SOURCES and PASTE USAGE | | The original database was generated from Landsat Multi-Spectral | Scanner image data. These and other forms of remotely | sensed imagery can be purchased at a price from relevant | governmental authorities. The data is usually in binary | form, and distributed on magnetic tape(s). | | SOURCE | The small sample database was provided by: | Ashwin Srinivasan | Department of Statistics and Modelling Science | University of Strathclyde | Glasgow | Scotland | UK | | ORIGIN | The original Landsat data for this database was generated | from data purchased from NASA by the Australian Centre | for Remote Sensing, and used for research at: | The Centre for Remote Sensing | University of New South Wales | Kensington, PO Box 1 | NSW 2033 | Australia. | | The sample database was generated taking a small section (82 | rows and 100 columns) from the original data. The binary values | were converted to their present ASCII form by Ashwin Srinivasan. | The classification for each pixel was performed on the basis of | an actual site visit by Ms. Karen Hall, when working for Professor | John A. Richards, at the Centre for Remote Sensing at the University | of New South Wales, Australia. Conversion to 3x3 neighbourhoods and | splitting into test and training sets was done by Alistair Sutherland | at Strathclyde University. | | HISTORY | The Landsat satellite data is one of the many sources of information | available for a scene. The interpretation of a scene by integrating | spatial data of diverse types and resolutions including multispectral | and radar data, maps indicating topography, land use etc. is expected | to assume significant importance with the onset of an era characterised | by integrative approaches to remote sensing (for example, NASA's Earth | Observing System commencing this decade). Existing statistical methods | are ill-equipped for handling such diverse data types. Note that this | is not true for Landsat MSS data considered in isolation (as in | this sample database). This data satisfies the important requirements | of being numerical and at a single resolution, and standard maximum- | likelihood classification performs very well. Consequently, | for this data, it should be interesting to compare the performance | of other methods against the statistical approach. | |4. DATASET DESCRIPTION | One frame of Landsat MSS imagery consists of four digital images | of the same scene in different spectral bands. Two of these are | in the visible region (corresponding approximately to green and | red regions of the visible spectrum) and two are in the (near) | infra-red. Each pixel is a 8-bit binary word, with 0 corresponding | to black and 255 to white. The spatial resolution of a pixel is about | 80m x 80m. Each image contains 2340 x 3380 such pixels. | | The database is a (tiny) sub-area of a scene, consisting of 82 x 100 | pixels. Each line of data corresponds to a 3x3 square neighbourhood | of pixels completely contained within the 82x100 sub-area. Each line | contains the pixel values in the four spectral bands | (converted to ASCII) of each of the 9 pixels in the 3x3 neighbourhood | and a number indicating the classification label of the central pixel. | The number is a code for the following classes: | | NUMBER OF EXAMPLES | training set 4435 | test set 2000 | | NUMBER OF ATTRIBUTES | 36 (= 4 spectral bands x 9 pixels in neighbourhood ) | | ATTRIBUTES | The attributes are numerical, in the range 0 to 255. | NUMBER of CLASS | | There are 6 decision classes: 1,2,3,4,5 and 7. | | NB. There are no examples with class 6 in this dataset- | they have all been removed because of doubts about the | validity of this class. | | N Description Train Test | ------------------------------------------------------------ | 1 red soil 1072(24.17%) 461 (23.05%) | 2 cotton crop 479 (10.80%) 224 (11.20%) | 3 grey soil 961 (21.67%) 397 (19.85%) | 4 damp grey soil 415 (09.36%) 211 (10.55%) | 5 soil with vegetation stubble 470 (10.60%) 237 (11.85%) | 6 mixture class (all types present) | 7 very damp grey soil 1038(23.40%) 470 (23.50%) | | NB. There are no examples with class 6 in this dataset. | | The data is given in random order and certain lines of data | have been removed so you cannot reconstruct the original image | from this dataset. | | In each line of data the four spectral values for the top-left | pixel are given first followed by the four spectral values for | the top-middle pixel and then those for the top-right pixel, | and so on with the pixels read out in sequence left-to-right and | top-to-bottom. Thus, the four spectral values for the central | pixel are given by attributes 17,18,19 and 20. If you like you | can use only these four attributes, while ignoring the others. | This avoids the problem which arises when a 3x3 neighbourhood | straddles a boundary. | | | |CONTACTS | statlog-adm@ncc.up.pt | bob@stams.strathclyde.ac.uk | | |================================================================================ | 1,2,3,4,5,7. A1: continuous. A2: continuous. A3: continuous. A4: continuous. A5: continuous. A6: continuous. A7: continuous. A8: continuous. A9: continuous. A10: continuous. A11: continuous. A12: continuous. A13: continuous. A14: continuous. A15: continuous. A16: continuous. A17: continuous. A18: continuous. A19: continuous. A20: continuous. A21: continuous. A22: continuous. A23: continuous. A24: continuous. A25: continuous. A26: continuous. A27: continuous. A28: continuous. A29: continuous. A30: continuous. A31: continuous. A32: continuous. A33: continuous. A34: continuous. A35: continuous. A36: continuous.