displace/ext-infer 问题修复 & 功能扩展

解决BUG、新增功能、兼容多环境部署,快速响应你的开发需求

邮箱:yvsm@zunyunkeji.com | QQ:316430983 | 微信:yvsm316

displace/ext-infer

最新稳定版本:v0.2.0

Composer 安装命令:

pie install displace/ext-infer

包简介

PHP 8.3+ native, in-process LLM inference and embeddings via llama.cpp.

README 文档

README

Local LLM inference for PHP, in-process.
Chat, embeddings, and reasoning models — no Python sidecar, no remote API.

CI Latest release PHP 8.3 / 8.4 / 8.5 Pre-release MIT License Documentation

What is ext-infer?

ext-infer is a PHP 8.3+ extension that loads a GGUF model and runs inference in the PHP process via llama.cpp. PHP-native semantic search, RAG pipelines, and CLI/worker inference work without shelling out to Python or hitting a remote API.

Written in Rust on top of ext-php-rs and the llama-cpp-2 bindings. The public PHP surface is fluent and role-aware — building a chat prompt looks like Prompt::system(...)->withUser(...), not a string of <|im_start|> tokens.

  • 💬 Chat completions via an immutable Prompt builder that renders through the model's embedded template — no manual <|im_start|> plumbing.
  • 🧱 Structured output — pass a JSON Schema (or raw GBNF grammar) and sampling is constrained so malformed output is impossible, not retried. A 0.6B model becomes a dependable extractor.
  • 🧠 Reasoning-model awareResponse::answer() and Response::reasoning() split Qwen3 / R1-style <think>…</think> output automatically.
  • 📊 EmbeddingsModel::embed() returns an Embedding with dimensions(), normalize(), cosineSimilarity(), and packed() (zero-copy handoff to vector indexes) built in.
  • 🎯 RerankingRerankModel scores (query, document) pairs through Qwen3-Reranker's calibrated yes/no judgment; completes the embed → rerank two-stage retrieval pipeline.
  • In-process — no subprocess fork, no IPC, no daemon. Latency is whatever the model takes to decode.
  • 🛠️ Apple Metal acceleration is opt-in (make release FEATURES=metal); CPU is the portable default.
  • 🧵 Thread-safeLlamaBackend is a Sync-guarded singleton and each call builds its own context, so ZTS PHP + parallel works by design.

Quick start

mkdir -p models
curl -L -o models/Qwen3-0.6B-Q8_0.gguf \
    https://huggingface.co/Qwen/Qwen3-0.6B-GGUF/resolve/main/Qwen3-0.6B-Q8_0.gguf
<?php
use Displace\Infer\Model;
use Displace\Infer\Prompt;

$model    = Model::load('models/Qwen3-0.6B-Q8_0.gguf');
$response = $model->chat(
    Prompt::system('You are a helpful assistant.')
        ->withUser('What is 2+2?'),
    maxTokens: 256,
    temperature: 0.0,
);

echo $response->answer(), PHP_EOL;   // "2 + 2 equals 4."
echo $response->reasoning() ?? '';    // captured <think>…</think>, if any

$model->close();
make build       # produces target/debug/libinfer.{so,dylib}
php -d extension=$PWD/target/debug/libinfer.dylib hello.php

Full walkthrough — including the interactive Symfony Console chat and pairwise-similarity embedding example — under examples/.

Documentation

infer.displace.tech hosts the full guide:

  • Getting started — install via PIE or from source, verify, troubleshoot.
  • Guide — prompts, chat, raw, embeddings, choosing a model.
  • Recipes — multi-turn chat, semantic search, RAG over markdown, worker pools.
  • Reference — full API surface, exceptions, environment variables, compatibility matrix.
  • Advanced — threading, Metal, performance tuning.

The site is built from docs/ with mdbook and deploys automatically on every push to main.

Compatibility

macOS arm64 Linux x86_64 Linux arm64 Windows
PHP 8.3
PHP 8.4
PHP 8.5

ZTS is supported by design (the code is thread-safe), enabled in composer.json, and not yet exercised in CI. Windows is intentionally out of scope for v0.1.

Roadmap

Shipped   chat completions · raw completions · grammar/JSON-Schema constrained generation · embeddings (+ packed float32 output) · RerankModel · reasoning split · typed exceptions · PHPT suite · CI matrix · PIE-compatible composer.json · tag-triggered binary release workflow · THIRD-PARTY-NOTICES + cargo about license manifest.

Next (v0.3+)   streaming completions · KV-cache reuse via reusable Session objects · stop-string support · LoRA adapters · tool calling · Apple Metal default on macos-arm64.

See PLAN.md for the current planning doc and RELEASE.md for the cut-a-release flow.

License

MIT © 2026 Eric Mann / Displace Technologies

统计信息

  • 总下载量: 9
  • 月度下载量: 0
  • 日度下载量: 0
  • 收藏数: 4
  • 点击次数: 2
  • 依赖项目数: 0
  • 推荐数: 0

GitHub 信息

  • Stars: 4
  • Watchers: 0
  • Forks: 0
  • 开发语言: Rust

其他信息

  • 授权协议: MIT
  • 更新时间: 2026-06-07

承接程序开发

PHP开发

VUE

Vue开发

前端开发

小程序开发

公众号开发

系统定制

数据库设计

云部署

网站建设

安全加固