Database creation
 

Global Coordinates:

  • provides a comprehensive framework for the database
  • allows the database to be viewed in its entirety so that interaction between elements can be evaluated
  • permits identification of potential problems and design alternatives
  • without a good database design, there may be
    • irrelevant data that will not be used
    • omitted data
    • no update potential
    • inappropriate representation of entities
    • lack of integration between various parts of the database
    • unsupported applications
    • major additional costs to revise the database

Issues in database design

  • what storage media to use?
    • how large is the database?
    • how much can be stored online? what access speed is required for what parts of the database?
    • how should the database be laid out on the various media?
    • what growth should be allowed for in acquiring storage devices?
  • how will the database change over time?
    • will new attributes be added?
    • will the number of features stored increase?
  • how should the data be partitioned - both geographically and thematically?
    • is source data partitioned?
    • will products be partitioned?
  • what security is needed?
    • who should be able to redefine schema - new attributes, new objects, new object classes?
    • who should be able to edit and update?
  • should the database be distributed or centralized?
    • if distributed, how will it be partitioned between hosts?
  • how should the database be documented?
    • who is responsible for maintaining standards of definition? standards of format? accuracy? should documentation include access to the compiler of the data?
  • how should database creation be scheduled?
    • where will the data come from?
    • who determines product priorities?
    • who is responsible for scheduling data availability?
  • the following sections address some of these questions.

Key hardware parameters

Volume

  • databases for GIS applications range from a few megabytes (a small resource management project) to terabytes
    • a small raster-based project using IDRISI, 100 by 200 cells, 50 layers might require 10 Mbytes database on a PC.
    • a mid-sized vector-based project for a National Forest using ARC/INFO might require 300 Mbytes
    • a national, archival database might reach many hundreds of Gbytes
    • the spatial database represented by the currently accumulated imagery of Landsat is order 1013 bytes.

Access speed

  • overhead - Storage media
    data which can be accessed in order 1 second is said to be "on-line"
    • to be on-line, data must be stored on fixed or removable disk
    • relative to other forms of permanent storage, disk costs are high, and there is an effective upper limit of order 100 Gbytes for on-line storage when using common magnetic disk technology
  • "archival" data (data which is comparatively stable through time) can be stored off-line until needed
    • only extracts will be on-line for analysis at any one time
    • archival systems incur additional time to mount media on hardware
    • access time to extract subsets from archival data once mounted is order 1 minute
  • archival media:
    • magnetic tape
    • removable disk
    • CD-ROM
    • no ability to edit data once written - this is acceptable for many types of geographical data
    • copies are very cheap
    • optical WORM (Write Once Read Many)
    • "video" tape
Network configuration
Should database be centralized or distributed?
there are two answers: 1. all departments share one common database, or 2. parts of the database exist on different workstations in an integrated network
  • each department responsible for maintaining its own share of the database
  • optimizes use of expertise
with modern technology (e.g. NFS (Network File System)) user may be unaware of actual location of data being used
  • some workstations may be "diskless", owning no part of the database

distributed databases require careful attention to responsibilities, standards, scheduling of updates.

Top

Integrating Quality Assurance into the GIS Project Life Cycle

Without data there would be no need for the computers, software and human resources that comprise GIS technology. Not just any data, but geographic data. And not just any geographic data, but data that is specific and reliable and that represents as closely as possible the spatial world we live in. The technology requires that the data be as clean, as healthy, as good as it can be. Neglecting that, the usefulness of the technology is short-lived. To maximize the quality of GIS databases there should exist a well-designed quality assurance plan that is strategically integrated with all facets of the GIS project.

Categories of Quality Assurance

All well-designed QA strategies have certain things in common. They must coexist within the processes that create and maintain the data. When they are not integrated within the procedures of the GIS project, they themselves can become an entry point for error. By definition, they must also incorporate key elements from the classic QA categories that are discussed below.

Completeness

Completeness is the adherence of the data to the database design. This means that all of the data conforms to a known standard for topology, table structure, precision, projection and other data-model specific requirements.

Validity

Validity is a measure of the attribute accuracy of the database. Each attribute must have a defined domain and range. The domain is the set of all legal values for the attribute. The range is the set of values within which the data must fall.

Top

Logical Consistency

Logical consistency is a measure of the interaction between the values of two or more functionally related attributes. As the value of one attribute changes, to maintain consistency, so must the values of its functionally related attributes.. An example would be the interaction between the attribute SLOPE and the attribute LANDUSE. If LANDUSE is "water", then SLOPE must be 0, any other value for SLOPE would be illogical.

Physical Consistency

Physical consistency is a measure of the topological correctness and geographic extent of the database. For example, the requirement that all electrical transformers in an electrical distribution database's GIS have annotation denoting phasing placed within fifteen feet of the transformer object is one that describes a physically consistent spatial requirement.

Referential Integrity

Referential integrity is a measure of the associativity of related tables based upon their primary and foreign key relationships. Primary and foreign keys must exist and they must associate sets of data in the tables given predefined rules for each table.

Positional Accuracy


Positional accuracy is a measure of how well each spatial object's position in the database matches reality. Positional error can be introduced via incorrect cartographic interpretation, through insufficient densification of vertices in line segments or through digital storage precision inadequacies, to name a few. These errors can be random, systematic and/or cumulative in nature. Positional accuracy must always be qualified, because after all, it is only just a map of reality.

(Acknowledgements: http://www.geog.ubc.ca/)

 
Site © GlobalCoordinates 2005 - 2011
Visit Webschematic.com
Site © GlobalCoordinates 2005 - 2011