Sie sind hier

Konfiguration

Die Anleitung zur Konfiguration ist bisher nur in Englisch verfügbar:

Standard installations and most parts should be usable out of the box without further configuration.
You have not to configure the complex options described below, but you can (i.e. to enable OCR).

User interface

Config file

The config file of the user interface is /var/www/search/config/config.php

UI language

The default language is english.

To switch the language of the user interface to german set the option $language to de

# Set language to german
$language = "de";

Scheduler: Starting (re)crawls automatically

If you dont use filemonitoring (and even then you should sometimes recrawl, if something failed or was changed at a bad moment), you should recrawl data sources from time to time automatically.

If you use our connectors and want most flexibility use cron and write a cronjob using our command line tools into a crontab or you call our webservice (REST-API) from a script or another webservice (i.e. webcron).

If you use Apache ManifoldCF for imports, there is a scheduler built in there. Just set the time in the web admin interface.

File indexer

Config file

Config file for indexing files: /etc/solr/solr-connector-files

OCR (text recognition in graphical formats)

OCR is off by default, because it slows down indexing. It uses many processor ressources and will need many seconds for each graphic file.

Enable OCR

Install the package tesseract-ocr (included in your linux distribution):

apt-get install tesseract-ocr

Set option enable_ocr in the global config /etc/solr/enhancer_ocr or in the config of the connector:

#Enable OCR
enable_ocr=True

If you enabled OCR, should enable OCR for images inside PDF files, too, since many PDF files are scans and do contain much text data only as graphics:

Set option enable_ocr_pdf:

#Enable OCR for images inside PDF files
enable_ocr_pdf=True

OCR language

Setting OCR language to an other language than english:

  1. Install the tesseract language package (for german: tesseract-ocr-deu). See the list of available languages for debian or ubuntu.
  2. set option ocr_language to the language of your documents. Default is eng for english (in tesseract its eng, not en!). For german set deu (in tesseract its not de!):

    # language for automatic text recognition (ocr)
    #ocr_language="eng"
    ocr_language="deu"

Enhancer: Enrich content

Enhancers enrich the content with additional data or metadata, which helps to find better the original content.

In /etc/solr/enhancer-rdf you can configure servers or services for metadata (like annotations or tagging) which is accessable as open standard RDF (Ressource Description Framework)

Adding custom fields / custom facets

To be able to use external, independent and modular tools and components writing directly to the Solr index, there is more than one place to configure mappings of new fields:

  • Your metadata plattform (i.e. Drupal) where you edit them or a scraping plattform where you read them saving the data in fields with fieldnames
  • the connectors / importers and enhancers which read this data from the (meta) data source and write it to solr
  • and the user interface which reads this fields from Solr and shows them under a human readable label to the user.

You can configure additional facets (interactive filters):

Mapping from Database fields or RDF properties to custom fields / custom facets in Solr within the connector / importer / enhancer

Config in which Solr-fields to write the additional data from the (meta)datasource:

If you use a RDF datasource, find out the name / URI (in semantic Web and RDF the "field name" is an url, too) of the external fields (i.e. a standard fields (like Dublin core metadata standard fields) or a custom field in Drupal).

Add this external fieldnames or uris mapping them to a internal Solr field (standard fields, standard facets or additional custom facets) in /etc/solr/enancer-rdf.

Example:
meta_property2facet = {
'http://purl.org/dc/terms/location': 'location_ss',
'http://semantic-mediawiki.org/swivt/1.0#specialProperty_dat': 'meta_date_dts'
}

Enable additional Solr custom fields / custom facets in the user interface

Config which of this new or additional Solr-fields should be shown in the user interface and under which caption:

Use the option $cfg[facets] in config/config.php to add custom facets in the user interface:
Setup additional Solr-fields (i.e. filled from below configurated connectors / importers or enhancers like additional RDF Metadata sources (i.e. your tagging and anntoation metadata server) or fields in which you write scraped data) and map them a human readable title / label.

Example:
// Additional facets (f.e. fields imported by a connector or enhancer which should be shown as interactive filter in the sidebar)
$cfg['facets']['yourfacet_s'] = array ('label'=>'Additional own facet');
$cfg['facets']['anotherfacet_s'] = array ('label'=>'Another additonal facet');