Pages

Friday, 19 May 2023

雅虎开源商业级搜索引擎Vespa



雅虎开源了商业级的大数据处理和服务引擎 Vespa。vespa的前身为AlltheWeb,是雅虎在2003年收购的一家挪威公司。雅虎收购alltheweb.com后的过去几年,团队从头开始重写了大部分引擎,将经验融入现代技术平台。其架构和功能除了搜索引擎还包含了:大规模集群, 高性能检索,硬实时,和强大的排序功能。

Vespa 被用于 Yahoo.com、Yahoo News、Yahoo Sports、Yahoo Finance、Yahoo Gemini、Flickr 等众多产品,每天处理和服务数十亿次的文档访问请求,同时还响应搜索查询、提供推荐、个性化内容和广告。Vespa 每秒处理和服务的内容和广告大约为 9 万次,延迟不到几十毫秒。拿Flickr举例,Vespa在几百亿图像上按照每秒数百次查询的规模执行关键字和图像搜索。此外,Vespa通过雅虎Gemini每天提供超过30亿个本地广告请求,每秒140k个请求。

Vespa可以专注于创建利用能够实时计算大型数据集的功能。通过使用Vespa,程序员可以在不到十分钟的时间内获得一个应用程序,并按照文档运行。

[repo owner=”vespa-engine” name=”vespa”]

---------------------------------------------------------------

The open big data serving engine. https://vespa.ai

vespa.ai

The open big data serving engine - Store, search, organize and make machine-learned inferences over big data at serving time.

This is the primary repository for Vespa where all development is happening. New production releases from this repository's master branch are made each weekday from Monday through Thursday.

Vespa build status: Vespa Build Status

Table of contents

Background

Use cases such as search, recommendation and personalization need to select a subset of data in a large corpus, evaluate machine-learned models over the selected data, organize and aggregate it and return it, typically in less than 100 milliseconds, all while the data corpus is continuously changing.

This is hard to do, especially with large data sets that needs to be distributed over multiple nodes and evaluated in parallel. Vespa is a platform which performs these operations for you with high availability and performance. It has been in development for many years and is used on a number of large internet services and apps which serve hundreds of thousands of queries from Vespa per second.

Install

Run your own Vespa instance: https://docs.vespa.ai/en/getting-started.html Or deploy your Vespa applications to the cloud service: https://cloud.vespa.ai

Usage

  • The application created in the getting started guide is fully functional and production ready, but you may want to add more nodes for redundancy.
  • See developing applications on adding your own Java components to your Vespa application.
  • Vespa APIs is useful to understand how to interface with Vespa
  • Explore the sample applications
  • Follow the Vespa Blog for feature updates / use cases

Full documentation is at https://docs.vespa.ai.

Contribute

We welcome contributions! See CONTRIBUTING.md to learn how to contribute.

If you want to contribute to the documentation, see https://github.com/vespa-engine/documentation

Building

You do not need to build Vespa to use it, but if you want to contribute you need to be able to build the code. This section explains how to build and test Vespa. To understand where to make changes, see Code-map.md. Some suggested improvements with pointers to code are in TODO.md.

Development environment

C++ and Java building is supported on CentOS Stream 8. The Java source can also be built on any platform having Java 17 and Maven installed. Use the following guide to set up a complete development environment using Docker for building Vespa, running unit tests and running system tests: Vespa development on CentOS Stream 8.

Build Java modules

export MAVEN_OPTS="-Xms128m -Xmx1024m"
./bootstrap.sh java
mvn install --threads 1C

Use this if you only need to build the Java modules, otherwise follow the complete development guide above.

from https://github.com/vespa-engine/vespa 

(https://docs.vespa.ai/,

https://docs.vespa.ai/en/vespa-cli.html)

 

 

No comments:

Post a Comment