Total Pageviews

Friday, 2 June 2023

基于 java的搜索引擎程序Open Search Server



Open Search Server(OSS )是一个基于Java编写的开源搜索引擎和全文搜索算法套件。可多语言对文档进行索引。多语言分析器将句子切成词,然后基于文档的语言将lemmatisation算法运用在词语之上。支持多种文档格式包括:XML、HTML、PDF、Word和PowerPoint等。此外还拥有一个便于操作的Web操作界面。

官网下载:https://cloud.opensearchserver.com/opensearchserver#download

https://www.opensearchserver.com/

https://www.opensearchserver.com/documentation/README.md

https://www.opensearchserver.com/documentation/installation/linux.md
-----------------------------------------------------------------------------------------

Open-source Enterprise Grade Search Engine Software

www.opensearchserver.com

OpenSearchServer

 Maven CentralJoin the chat at https://gitter.im/jaeksoft/opensearchserver

OpenSearchServer is a powerful, enterprise-class, search engine software based on Lucene. Using the web user interface, the crawlers (web, file, database, ...) and the JSON webservice you will be able to integrate quickly and easily advanced full-text search capabilities in your application. OpenSearchServer runs on Linux/Unix/BSD/Windows.

Quickstart

Go with the interface and/or the API

http://localhost:9090

Useful links

Features

Search functions

  • Advanced full-text search features
  • Phonetic search
  • Advanced boolean search with query language
  • Clustered results with faceting and collapsing
  • Filter search using sub-requests (including negative filters)
  • Geolocation
  • Spell-checking
  • Relevance customization
  • Search suggestion facility (auto-completion)

Indexation

  • Supports 18 languages
  • Fields schema with analyzers in each language
  • Several filters: n-gram, lemmatization, shingle, stripping diacritic from words,…
  • Automatic language recognition
  • Named entity recognition
  • Word synonyms and expression synonyms
  • Export indexed terms with frequencies
  • Automatic classification

Document supported

  • HTML / XHTML
  • MS Office documents (Word, Excel, Powerpoint, Visio, Publisher)
  • OpenOffice documents
  • Adobe PDF (with OCR)
  • RTF, Plaintext
  • Audio files metadata (wav, mp3, AIFF, Ogg)
  • Torrent files
  • OCR over images

Crawlers

  • The web crawler for internet, extranet and intranet
  • The file systems crawler for local and remote files (NFS, SMB/CIFS, FTP, FTPS, SWIFT)
  • The database crawler for all JDBC databases (MySQL, PostgreSQL, Oracle, SQL Server, …)
  • Filter inclusion or exclusion with wildcards
  • Session parameters removal
  • SQL join and linked files support
  • Screenshot capture

General

  • JSON web service
  • Index replication and sharding
  • Federated search

from https://github.com/jaeksoft/opensearchserver

----------------------------------------------------------------

How to build OpenSearchServer

Would you like to contribute to OpenSearchServer?

Here is how to compile and build OSS.

Prerequisites

Here are the tools you need to build OpenSearchServer:

  • To build the war: Maven.
  • To build the archive package (zip and tar.gz): Ant.

Extract the source code using GIT

The default and currently active branch is 1.5.

git clone https://github.com/jaeksoft/opensearchserver

Go to the opensearchserver directory

cd opensearchserver

Use Maven to build the jar, war, deb and rpm package

mvn -Dgpg.skip=true package clean package rpm:attached-rpm

Use Ant to build the zip and tar.gz package

The archive includes Apache Tomcat, as well as the start and stop scripts.

ant clean dist dist-src

The built zip and tar.gz archive are available here:

dist/opensearchserver.tar.gz
dist/opensearchserver.zip

Alternatively, you can download these packages at SourceForge.

from https://www.opensearchserver.com/documentation/building_opensearchserver.md

-------------------------------------------------

类似的程序Elasticsearch:

https://briteming.blogspot.com/2016/06/javaelasticsearch.html

https://briteming.blogspot.com/2022/07/elasticsearch-java.html

 

 

 

 

 


No comments:

Post a Comment