Creation of Custom Text Extraction Geoprocessing Tools in ArcGIS Pro

Toolbox:

Extract Location Names

Tools:

ExtractLocNameTxt

ExtractLocNameFC

These tools were developed to efficiently extract location names from unstructured .txt files or text fields to streamline the process of geocoding. The tools provide a simpler alternative to creating a custom location file (.lxtgaz) using the LocateXT extension in ArcGIS Pro. Creating a custom location file requires manually entering the text of each place name you wish to extract from a dataset or txt file, which becomes cumbersome when working with large datasets. The Extract Location Names tools employ the functionality of spaCy, an open-source natural language processing library. Unstructured text fields containing location names are run through one of spaCy’s English language trained pipeline to identify words marked as Geopolitical Entities (GPE). If a GPE is detected, it’s text is extracted, stored in a dictionary, and then written into a new attribute field for that feature.

These tools add to the existing functionality of the ExtractLocationsText tool, which can extract geographic coordinates and dates from an unstructured text file. The ExtractLocNameTxt tool uses the ExtractLocationsText tool in its code to first create a feature class which contains geographic coordinates and the unstructured text associated with them (in Pre-Text and Post-Text fields). The ExtractLocNameTxt tool requires a LocateXT license to run.

The ExtractLocNameFC tool employs the same GPE recognition as the ExtractLocNameTxt tool, but instead asks the user to specify a field in an existing feature class to perform location text extraction on. The ExtractLocNameFC tool does not require a LocateXT license to run. Both tools require the user to install the spacy package and download a default English-trained pipeline. Instructions for this set up are provided in the user documentation.


Screenshot of User Interface of Geoprocessing tool parameters in ArcGIS Pro


 
Previous
Previous

Crowdsourcing and Data Curation Mission for In America: Remember

Next
Next

SARS-CoV-2 Infections in Animals