Open Semantic Search Appliance (search server VM)

Search engine as virtual appliance or preconfigured virtual machine with a open source search server for teams

The free software Open Semantic Search Appliance is the Open Semantic Search all in package (including Solr search server, open source tools and search user interfaces) as virtual machine image for teams having a virtual machine host like a Windows server or Linux server running a virtual machine host like Virtual Box, VMWare, QEmu or KVM for search, analysis and document mining in many documents

Installation of the virtual machine

Configuration of document folders

Edit the settings of the new virtual machine (choose the virtual machine in the left sidebar and click the "settings" button in the top bar).

Add shared folders with documents:

You can add one shared folder or multiple shared folders pointing to local folders on your harddisk to point to your documents, that should be indexed, searched and analysed.

Activate the option Auto-mount.

This folder(s) can be set read only, so you can not accidently delete important documents from inside the virtual machine.

If you don't want to use Virtual Box shared folders for access to the documents or if you use another virtualization solution than Virtual Box, you setup (mount) network fileshares within the virtual machines config.

Processor and RAM settings

If available, set more RAM than the preconfigured 4096 MB.

If you have multiple free processors, give the virtual machine access to more then the preconfigured two processors in the config tab System / Processor to enable parallel processing so the indexing and analysis processes will be much faster.

Network configuration

Give your clients access to the IP and port 80 of the virtual machine. Therefore you can for example set port forwarding to port 80 of the guest in "extended network settings" of the virtual machine.

External index (optionally)

If you have very much data so some douzen Gigabytes for the index within the virtual machine will be too few space and you don't want to increase the VM image size, you can store the index in a directory or filesystem outside the Virtual Machine.

Therefore add a shared folder named index (activate the option Auto-mount).

Don't set this folder to read only, since the search engine has to store the index in it.

Create a directory tmp inside this shared folder index.

Index documents

Web user interface

You can use the datasources web user interface to add paths you added before in Virtual Box shared folder config like /media/sf_documents and click on "Index" after saving.

REST-API

You can use the [REST-API to index a folder or files:

For example call http://search-vm.localdomain/search-apps/api/index-dir?uri=/media/sf_documents in your browser or from the commandline with Curl (use the domain name or the IP-address of your virtual machine instead of search-vm.localdomain) to index all documents in the configured shared folder documents.

Command line tools

Or login as root on the virtual machine console and use the command line tools:

opensemanticsearch-index-dir */media/sf\_documents/*

Access to search user interface

  • Launch the search user interface by opening http://vm.localdomain/search with your browser. Use the domain name or the IP-address of your virtual machine instead of search-vm.localdomain

Search, explore and analyse

Additional document directories or file shares

If you want to mount index additional file shares, you have to make them available to the virtual machine.

Virtual Box shared folders

If you use Virtual Box you can configure shared folders, so the search engine can access the files. Read only mode is enough to index this directories. If you set to "auto mount" such shared folders are mounted in /media/sf_sharedfoldername inside the virtual machine.

Linux mount and fuse filesystems

Additionally you can mount each file systems Debian GNU/Linux can mount, for example a NAS, SMB, NFS, SSHFS, Amazon S3, Google drive and many other filesystems, file shares and file storage protocols.

Therefore use the console of the virtual machine to login as root (you have not to enter a password, since you have access to the virtual machine host) and mount some file shares using /etc/fstab as config file.

Access

If you don't secure the used Apache webserver or this web application, all who can access your computers IP have full access to the search engine and so access to all indexed contents, too.

With its API and web apps in /search-apps it is possible to index data or tag documents.

If you dont want to secure the whole Apache web server (for example limit access to localhost or add a password protection with htaccess), you should limit access to the search in /etc/solr-php-client-ui/apache.conf and limit access to the REST-API and web apps in /etc/opensemanticsearch-django-webapps/apache.conf ...

The initial password for the Django admin interface (i.e. for adding tags that are usable for documents tagging) is live

Since as admin you have control of the virtual machine host and you have full access to the VMs data and so there is no root password, just login as root without password on the virtual machine console.

You can set a root password with passwd root and install the package openssh-server with apt-get install openssh-server to be able to login over the net, too.

Roadmap

Please donate for more time to work on it and release earlier:

HTTPS out of the box

Add HTTPS without manual config of Apache webserver.

Web interface for config

The configuration is based on config files in /etc/opensemanticsearch Maybe some day there will be a web interface for that.](../../rest-api)