Ollama 使用记录

November 24, 2023

项目地址

安装

环境安装

在官网点击下载即可

模型安装 | 打开 terminal 运行会自动下载模型

ollama run llama2

更多模型选择

mistral (7B) 目前看对中英文的支持都挺好，值得推荐
llama2 (7B)
llama2:13b (13B)
llava:13b 图文模型
mixtral Mixture of Experts

查看已经安装的模型

ollama list

删除某个模型

ollama rm llama2

使用

update at 2024.01.25

官方提供了 Python & JavaScript Libraries 使用起来会更方便

pip install ollama / npm install ollama

Python 的简单案例

import ollama
response = ollama.chat(model='llama2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])

Ollama Python & JavaScript Libraries

使用过程需要确保 Ollama 应用已经打开

Terminal 模式

运行以下代码即可

ollama run llama2

UI 模式

后续官方估计会出一个客户端，可以期待一下

目前可以考虑使用 ollama-webui 推荐docker 模式启动

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway --name ollama-webui --restart always ghcr.io/ollama-webui/ollama-webui:main

启动后访问 http://localhost:3000 即可

API模式

类似如下模式调用（默认在11434这个端口）

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama2",
  "prompt":"Why is the sky blue?"
 }'

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt":"Here is a story about llamas eating grass"
 }'

Python 代码

可以直接在代码中调用API模式,也可以使用第三方工具包比如 LangChain 或者 LlamaIndex

LangChain

可以考虑使用 LangChain 来完成这个任务 ,简单流程如下

安装

pip install langchain

加载模型

from langchain.llms import Ollama
llm = Ollama(model="llama2")

非Stream模式

llm("Who are you")

Stream模式

for chunk in llm.stream("Who are you"):
    print(chunk, end="", flush=True)

Embedding 模式

from langchain.embeddings import OllamaEmbeddings

oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="llama2")
oembed.embed_query("Who are you")

LlamaIndex

可以考虑使用 LlamaIndex 来完成这个任务 ,简单流程如下

安装

pip install llama-index

加载模型

from llama_index.llms import Ollama
from llama_index.llms import ChatMessage
llm = Ollama(model="llama2")

非Stream模式

# complete 调用
resp = llm.complete("Who are you")

# chat 调用
messages = [
    ChatMessage(
        role="system", content="You are a personal assistant who's name is Aha"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)

Stream模式

# stream_complete 调用
response = llm.stream_complete("Who are you")
for r in response:
    print(r.delta, end="")

# stream_chat 调用
messages = [
    ChatMessage(
        role="system", content="You are a personal assistant who's name is Aha"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)

for r in response:
    print(r.delta, end="")

部署到Streamlit

import streamlit as st
from langchain.llms import Ollama
llm = Ollama(model="mistral")

if prompt := st.chat_input():
    st.chat_message("user").write(prompt)
    with st.chat_message("assistant"):
        response = llm.stream(prompt)
        placeholder = st.empty()
        full_response = ''
        for item in response:
            full_response += item
            placeholder.markdown(full_response)
        placeholder.markdown(full_response)

Obsidian中使用

使用 Ollama 插件可在第三方库中搜索得到

安装后配置相关常用prompt即可（需要保证Ollama处于运行状态）

Tips :

配置批量 Prompt可以打开当前工作区的 .obsidian 目录 .obsidian/plugins/ollama 直接修改ollama插件目录下的 data.json 文件

参数设置

在任意目录下创建一个 Modelfile

FROM llama2

# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# set the system prompt
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

运行如下代码

ollama create mario -f ./Modelfile
ollama run mario

然后就可以愉快玩耍了

ZJun Tech.