title: Structuring data and documents: Metadata management of Names & Concepts by Lists, Vocabulary, Thesaurus and Ontologies and Named Entity Recognition authors: - Markus Mandalka


Structuring data and documents: Metadata management of Names & Concepts by Lists, Vocabulary, Thesaurus and Ontologies and Named Entity Recognition

Structure data, filters, navigation and aggregated overviews by Lists, Dictionaries, Vocabularies, Taxonomies, Mindmaps, Thesaurus (SKOS) and Graphs (Ontologies) for filtering, clustering and aggregating your documents

You can structure, cluster and filter your data by different methods and structured data like lists of names or concepts (named entities) or ontologies based on named entities:

Named Entities: Names of people, organizations, places or concepts

Named Entities are for example of people of interests, organizations like companies, places like town names or important concepts or words. You can manage Named Entities (Names, Concepts, Persons, Places, Locations) name by name in the Thesaurus.

Named entities recognition adds some unknown entities by machine learning.

Container formats for lists of named entities, vocabularies or ontologies

Multiple such named entities can be stored and organized in Dictionaries, Vocabularies, Databases, Lists or Ontologies

So you can import external data sources with many named entities by the Lists, Vocabularies & Ontologies manager.

Structure

Based on such named entities or categories you can structure your documents with such names by the following methods:

Categories, groups or lists (Classification / tagging / categorizing / classifying by being on a list or ontology, Tagging by rules and queries or clustering by machine learning)

Hierarchies, Trees or Mindmaps (Taxonomies)

Network or graph of concepts / words (Connected words and concepts in Thesaurus) Open standard format: Simple Knowledge Organization System (SKOS)

Example data: Custom domain thesaurus or linked open data from Wiktionary

Other domain specific or private Ontologies Open standard format: RDFS or OWL

Manual tagging and annotation

Tagging by machine learning (Automatic classification or clustering)

Links (Connections) or Networks

Rules Grammar rules and grammar heuristics (Stemming)