Name | Implementation Language |
Datastore | Target Road Network | Address Parsing Technique | Fuzzy Name Matching Technique | Provides Configuration Language? |
Last Activity | |
Explorer GeoCoder | Description | "a data and country independent geocoding engine" which can "assign latitude and longitude coordinates to any United States street address or intersection" | ||||||
Maintainer | SRC | |||||||
C++ | ? | TIGER | Regex +? | Soundex? | No | ? | ||
Frost TIGER Geocoder | Description | A geocoder for TIGER data implemented in PostgreSQL | ||||||
Maintainer | Stephen Frost et al | |||||||
PostgreSQL SQL | PostgreSQL | TIGER | Regex | Soundex? | No | ? | ||
Geo-Coder-US | Description | For geocoding US addresses, that is, estimating the latitude and longitude of any street address or intersection in the United States, using the TIGER/Line data set | ||||||
Maintainer | Schuyler Erle | |||||||
Perl | ? | TIGER | Regex | Metaphone? | No | ? | ||
Geocoder::US | Description |
A rewrite of Geo-Coder-US into Ruby (and also requiring C and SQLite).
"Although it is primarily intended for use with the US Census Bureau�s free TIGER/Line dataset, it uses an abstract US address data model that can be employed with other sources of US street address range data" |
||||||
Maintainer | GeoCommons | |||||||
Ruby, C | SQLite | TIGER | Regex | Metaphone | No | ? | ||
JGeoCoder | Description | A Java API loosely modelled after Geo::Coder::US | ||||||
Maintainer | ??? | |||||||
Java | JDBC (H2 image supplied) | TIGER | Regex, Java code | Soundex? | No | 2008 | ||
PAGC | Description | "The Postal Address Geo-Coder (or PAGC) is a library and a CGI based web service written in ANSI C that uses an address-ranged street network shapefile along with one or more postal addresses and provides the longitude/latitude coordinates of each matched address. PAGC has been designed to make it easily extensible to the postal address structure of many Western countries. Out of the box it works with publicly available road network data sets from the US (shapefiles of the US Census Bureau�s TIGER/Line data) and Canada (shapefiles of Statistics Canada�s Road Network Files). " | ||||||
Maintainer | http://www.pagcgeo.org/ | |||||||
C | Berkely DB | TIGER, Stats Can Road Network | "based on Aho-Corasick string matching" | Soundex, Edit Distance | Yes, to some extent | 2011? |
The most direct way of implementing adress parsing is to code it in a programming language. Often this can make use of language support for things like regular expressions. This approach has the downside of being opaque to understanding, and difficult to modify (both by users and the application maintainer themselves!)
More sophisticated address parsers define the parsing
rules using a grammar-driven algorithm.
Ideally, the grammar is exposed using a configuration language,
allowing customization to suit different address domains.
Fuzzy Name Matching Technique
Another challenging area in geocoding is supporting
approximate or fuzzy matching of input street names to
the reference street dataset.
This is essential to accomodate spelling mistakes and user uncertainty
in real-world input.
There are a variety of techniques which are commmonly used:
Soundex, Metaphone, Bigrams, etc.
One challenge with fuzzy name matching against a large
corpus of valid names is how to obtain efficient performance.
For this an indexing strategy is almost certainly required.
Each fuzzy matching approach may require a different indexing
technique.
Configuration Language
Real-world address models and road network datasets are typically
fairly complex and non-uniform.
A high-quality geocoder allows customization of various
operational parameters in order to support a wider variety
of input reference datasets.
Configuration parameters can include such things as:
When a large number of configuration parameters
are provided,
it is likely that the best way to expose them
is in a file whose contents are specified by
a configuration language.