LlamaIndex 初探

December 4, 2023

背景

将大语言模型运用在个人数据上，一个用于实现RAG的框架

LlamaIndex serves as a bridge between your data and Large Language Models (LLMs), providing a toolkit that enables you to establish a query interface around your data for a variety of tasks, such as question-answering and summarization.

落地

演示

安装准备

pip install llama-index

# 如果使用OpenAI 需要设定环境变量(当前默认使用OpenAI)
export OPENAI_API_KEY=XXXXX

最基础的演示

# indexing 以及 LLM请求 均包装在内部 且 默认使用了OpenAI

import os.path
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)

# 增加了Index的缓存逻辑
if not os.path.exists("./storage"):
	# 所有文档存放在data目录下，支持一个和多个，进行加载和索引构建
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist()
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir="./storage")
    index = load_index_from_storage(storage_context)

# either way we can now query the index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

# 下方会有如何使用自定义模型的介绍

PS：日志查看

import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

# logging.INFO or logging.DEBUG ...

逻辑

基础流程：加载数据-构建索引-索引存储-检索-评估

自定义

使用特定的LLM

from llama_index import ServiceContext
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
)
from llama_index.llms import PaLM

service_context = ServiceContext.from_defaults(llm=PaLM())

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)
query_engine = index.as_query_engine(service_context=service_context)

基于 Ollama 以及 Langchain 使用 LlamaIndex 串起一个 RAG 在本地使用 ⭐️

使用下来在英文文档上效果较好，中文场景效果一般，估计和中间环节Prompt是英文的，以及llama系列模型在英文上支持较好有关系

注: 使用 Langchain 来加载 Ollama 中的 Embedding 比较方便，且 LlamaIndex 中也兼容

from llama_index import ServiceContext
from llama_index.llms import Ollama
from llama_index import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
)
from langchain.embeddings import OllamaEmbeddings

oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="mistral")

llm_ollama = Ollama(model="mistral")

# https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html
service_context = ServiceContext.from_defaults(embed_model=oembed,llm = llm_ollama,chunk_size = 1000)

documents = SimpleDirectoryReader("data").load_data()

index = VectorStoreIndex.from_documents(
    documents, service_context=service_context
)

query_engine = index.as_query_engine(service_context=service_context)

使用自定义数据切块

from llama_index import ServiceContext

# 原本
 index = VectorStoreIndex.from_documents(documents)

# 自定义 手动输入 service_context
service_context = ServiceContext.from_defaults(chunk_size=1000)

# =================== 或者 ==========================
text_splitter = SentenceSplitter(chunk_size=512, chunk_overlap=10)
service_context = ServiceContext.from_defaults(text_splitter=text_splitter)
# =================================================

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

每次返回更多的结果

query_engine = index.as_query_engine(similarity_top_k=5)

可以定义 response_mode , 内部是基于不同的prompt对结果进行了一些加工（比如结构化、总结等等）

query_engine = index.as_query_engine(response_mode="tree_summarize")

可以将结果按照streaming模式输出

query_engine = index.as_query_engine(streaming=True)
response = query_engine.query("What did the author do growing up?")
response.print_response_stream()

# 另一个代码模式
chat_engine = index.as_chat_engine()
streaming_response = chat_engine.stream_chat("Tell me a joke.")
for token in streaming_response.response_gen:
    print(token, end="")

对话模式 (know more)

query_engine = index.as_chat_engine()

response = query_engine.chat("What did the author do growing up?")
print(response)
response = query_engine.chat("Oh interesting, tell me more.")
print(response)'
'
# Reset chat history to start a new conversation:
chat_engine.reset()
# Enter an interactive chat REPL:
chat_engine.chat_repl()
# 进一步可以指定 chat_mode 有 best / condense_question / context / condense_plus_context / simple 等

读取 Obsidian 的数据（更多其他数据源可参考 LlamaHub）

from llama_index import download_loader
import os

ObsidianReader = download_loader('ObsidianReader')
documents = ObsidianReader('/path/to/dir').load_data() # Returns list of documents

类似GPTs一样基于LlamaIndex使用自然语言构建 RAG

https://github.com/run-llama/rags

一些摘录

关于对话的不错的Prompt

custom_prompt = PromptTemplate(
    """\
Given a conversation (between Human and Assistant) and a follow up message from Human, \
rewrite the message to be a standalone question that captures all relevant context \
from the conversation.

<Chat History>
{chat_history}

<Follow Up Message>
{question}

<Standalone question>
"""
)

ZJun Tech.

LlamaIndex 初探

背景

落地

演示

逻辑

自定义

一些摘录

参考