Total Pageviews

Monday, 8 December 2025

搭建基于nextjs的静态博客程序stbt(支持分页)


首先fork此项目https://github.com/sub-t/blog-template,我fork后的项目地址是https://github.com/briteming/stbt,
然后访问https://github.com/briteming/stbt/tree/main/_posts,新建源帖test.md,内容为

---
title: '测试'
excerpt: '这是一篇文章'
coverImage: '/assets/blog/dynamic-routing/cover.jpg'
date: '2025-11-29T09:19:00'
ogImage:
  url: '/assets/blog/dynamic-routing/cover.jpg'
tags:
  - 'misc1'
  - 'misc2'
  - 'misc3'
---

这是测试。

看看如何?

( 详见https://github.com/briteming/stbt/blob/main/_posts/test.md?plain=1)

 然后访问vercel.com/new ,导入项目https://github.com/briteming/stbt,点击deploy按钮,等待部署完成,部署完成后,我得到网址https://stbt-moon.vercel.app/

 新建源帖后,博客网站2分钟内就会更新。

 项目地址:

https://github.com/sub-t/blog-template

 https://github.com/briteming/stbt

演示博客:

https://stbt-moon.vercel.app/

https://stbt-moon.vercel.app/posts/page/1/, 支持分页。

stbt是静态博客程序snms(https://briteming.blogspot.com/2025/11/nextjssnms.html)的作者开发的另一个静态博客程序.此程序不能渲染嵌入的视频,比如https://stbt-moon.vercel.app/posts/fh/,所以就只好粘贴视频的链接地址。

 related post: https://briteming.blogspot.com/2025/11/nextjssnms.html

 

VibeVoice


microsoft.github.io/VibeVoice/

Open-Source Frontier Voice AI: VibeVoice

Project Page Hugging Face Technical Report

VibeVoice Logo

📰 News

New Realtime TTS

2025-12-03: 📣 We open-sourced VibeVoice‑Realtime‑0.5B, a real‑time text‑to‑speech model that supports streaming text input and robust long-form speech generation. Try it on Colab.

2025-12-09: 📣 We’ve added experimental speakers in nine languages (DE, FR, IT, JP, KR, NL, PL, PT, ES) for exploration—welcome to try them out and share your feedback.

To mitigate deepfake risks and ensure low latency for the first speech chunk, voice prompts are provided in an embedded format. For users requiring voice customization, please reach out to our team. We will also be expanding the range of available speakers.

VibeVoice_Realtime.mp4
VibeVoice_Realtime.mp4

(Launch your own realtime demo via the websocket example in Usage).

2025-09-05: VibeVoice is an open-source research framework intended to advance collaboration in the speech synthesis community. After release, we discovered instances where the tool was used in ways inconsistent with the stated intent. Since responsible use of AI is one of Microsoft’s guiding principles, we have disabled this repo until we are confident that out-of-scope use is no longer possible.

Overview

VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking.

VibeVoice currently includes two model variants:

  • Long-form multi-speaker model: Synthesizes conversational/single-speaker speech up to 90 minutes with up to 4 distinct speakers, surpassing the typical 1–2 speaker limits of many prior models.
  • Realtime streaming TTS model: Produces initial audible speech in ~300 ms and supports streaming text input for single-speaker real-time speech generation; designed for low-latency generation.

A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic and Semantic) operating at an ultra-low frame rate of 7.5 Hz. These tokenizers efficiently preserve audio fidelity while significantly boosting computational efficiency for processing long sequences. VibeVoice employs a next-token diffusion framework, leveraging a Large Language Model (LLM) to understand textual context and dialogue flow, and a diffusion head to generate high-fidelity acoustic details.

MOS Preference Results VibeVoice Overview

🎵 Demo Examples

Video Demo

We produced this video with Wan2.2. We sincerely appreciate the Wan-Video team for their great work.

 

Risks and limitations

While efforts have been made to optimize it through various techniques, it may still produce outputs that are unexpected, biased, or inaccurate. VibeVoice inherits any biases, errors, or omissions produced by its base model (specifically, Qwen2.5 1.5b in this release). Potential for Deepfakes and Disinformation: High-quality synthetic speech can be misused to create convincing fake audio content for impersonation, fraud, or spreading disinformation. Users must ensure transcripts are reliable, check content accuracy, and avoid using generated content in misleading ways. Users are expected to use the generated content and to deploy the models in a lawful manner, in full compliance with all applicable laws and regulations in the relevant jurisdictions. It is best practice to disclose the use of AI when sharing AI-generated content.

English and Chinese only: Transcripts in languages other than English or Chinese may result in unexpected audio outputs.

Non-Speech Audio: The model focuses solely on speech synthesis and does not handle background noise, music, or other sound effects.

Overlapping Speech: The current model does not explicitly model or generate overlapping speech segments in conversations.

We do not recommend using VibeVoice in commercial or real-world applications without further testing and development. This model is intended for research and development purposes only. Please use responsibly.

from  https://github.com/microsoft/VibeVoice

Open-WebUI

 

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

openwebui.com

GitHub stars GitHub forks GitHub watchers GitHub repo size GitHub language count GitHub top language GitHub last commit Discord

Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners like Ollama and OpenAI-compatible APIs, with built-in inference engine for RAG, making it a powerful AI deployment solution.

Passionate about open-source AI? Join our team →

Open WebUI Demo

Tip

Looking for an Enterprise Plan?Speak with Our Sales Team Today!

Get enhanced capabilities, including custom theming and branding, Service Level Agreement (SLA) support, Long-Term Support (LTS) versions, and more!

For more information, be sure to check out our Open WebUI Documentation.

Key Features of Open WebUI ⭐

  • 🚀 Effortless Setup: Install seamlessly using Docker or Kubernetes (kubectl, kustomize or helm) for a hassle-free experience with support for both :ollama and :cuda tagged images.

  • 🤝 Ollama/OpenAI API Integration: Effortlessly integrate OpenAI-compatible APIs for versatile conversations alongside Ollama models. Customize the OpenAI API URL to link with LMStudio, GroqCloud, Mistral, OpenRouter, and more.

  • 🛡️ Granular Permissions and User Groups: By allowing administrators to create detailed user roles and permissions, we ensure a secure user environment. This granularity not only enhances security but also allows for customized user experiences, fostering a sense of ownership and responsibility amongst users.

  • 📱 Responsive Design: Enjoy a seamless experience across Desktop PC, Laptop, and Mobile devices.

  • 📱 Progressive Web App (PWA) for Mobile: Enjoy a native app-like experience on your mobile device with our PWA, providing offline access on localhost and a seamless user interface.

  • ✒️🔢 Full Markdown and LaTeX Support: Elevate your LLM experience with comprehensive Markdown and LaTeX capabilities for enriched interaction.

  • 🎤📹 Hands-Free Voice/Video Call: Experience seamless communication with integrated hands-free voice and video call features using multiple Speech-to-Text providers (Local Whisper, OpenAI, Deepgram, Azure) and Text-to-Speech engines (Azure, ElevenLabs, OpenAI, Transformers, WebAPI), allowing for dynamic and interactive chat environments.

  • 🛠️ Model Builder: Easily create Ollama models via the Web UI. Create and add custom characters/agents, customize chat elements, and import models effortlessly through Open WebUI Community integration.

  • 🐍 Native Python Function Calling Tool: Enhance your LLMs with built-in code editor support in the tools workspace. Bring Your Own Function (BYOF) by simply adding your pure Python functions, enabling seamless integration with LLMs.

  • 💾 Persistent Artifact Storage: Built-in key-value storage API for artifacts, enabling features like journals, trackers, leaderboards, and collaborative tools with both personal and shared data scopes across sessions.

  • 📚 Local RAG Integration: Dive into the future of chat interactions with groundbreaking Retrieval Augmented Generation (RAG) support using your choice of 9 vector databases and multiple content extraction engines (Tika, Docling, Document Intelligence, Mistral OCR, External loaders). Load documents directly into chat or add files to your document library, effortlessly accessing them using the # command before a query.

  • 🔍 Web Search for RAG: Perform web searches using 15+ providers including SearXNG, Google PSE, Brave Search, Kagi, Mojeek, Tavily, Perplexity, serpstack, serper, Serply, DuckDuckGo, SearchApi, SerpApi, Bing, Jina, Exa, Sougou, Azure AI Search, and Ollama Cloud, injecting results directly into your chat experience.

  • 🌐 Web Browsing Capability: Seamlessly integrate websites into your chat experience using the # command followed by a URL. This feature allows you to incorporate web content directly into your conversations, enhancing the richness and depth of your interactions.

  • 🎨 Image Generation & Editing Integration: Create and edit images using multiple engines including OpenAI's DALL-E, Gemini, ComfyUI (local), and AUTOMATIC1111 (local), with support for both generation and prompt-based editing workflows.

  • ⚙️ Many Models Conversations: Effortlessly engage with various models simultaneously, harnessing their unique strengths for optimal responses. Enhance your experience by leveraging a diverse set of models in parallel.

  • 🔐 Role-Based Access Control (RBAC): Ensure secure access with restricted permissions; only authorized individuals can access your Ollama, and exclusive model creation/pulling rights are reserved for administrators.

  • 🗄️ Flexible Database & Storage Options: Choose from SQLite (with optional encryption), PostgreSQL, or configure cloud storage backends (S3, Google Cloud Storage, Azure Blob Storage) for scalable deployments.

  • 🔍 Advanced Vector Database Support: Select from 9 vector database options including ChromaDB, PGVector, Qdrant, Milvus, Elasticsearch, OpenSearch, Pinecone, S3Vector, and Oracle 23ai for optimal RAG performance.

  • 🔐 Enterprise Authentication: Full support for LDAP/Active Directory integration, SCIM 2.0 automated provisioning, and SSO via trusted headers alongside OAuth providers. Enterprise-grade user and group provisioning through SCIM 2.0 protocol, enabling seamless integration with identity providers like Okta, Azure AD, and Google Workspace for automated user lifecycle management.

  • ☁️ Cloud-Native Integration: Native support for Google Drive and OneDrive/SharePoint file picking, enabling seamless document import from enterprise cloud storage.

  • 📊 Production Observability: Built-in OpenTelemetry support for traces, metrics, and logs, enabling comprehensive monitoring with your existing observability stack.

  • ⚖️ Horizontal Scalability: Redis-backed session management and WebSocket support for multi-worker and multi-node deployments behind load balancers.

  • 🌐🌍 Multilingual Support: Experience Open WebUI in your preferred language with our internationalization (i18n) support. Join us in expanding our supported languages! We're actively seeking contributors!

  • 🧩 Pipelines, Open WebUI Plugin Support: Seamlessly integrate custom logic and Python libraries into Open WebUI using Pipelines Plugin Framework. Launch your Pipelines instance, set the OpenAI URL to the Pipelines URL, and explore endless possibilities. Examples include Function Calling, User Rate Limiting to control access, Usage Monitoring with tools like Langfuse, Live Translation with LibreTranslate for multilingual support, Toxic Message Filtering and much more.

  • 🌟 Continuous Updates: We are committed to improving Open WebUI with regular updates, fixes, and new features.

Want to learn more about Open WebUI's features? Check out our Open WebUI documentation for a comprehensive overview!


We are incredibly grateful for the generous support of our sponsors. Their contributions help us to maintain and improve our project, ensuring we can continue to deliver quality work to our community. Thank you!

How to Install 🚀

Installation via Python pip 🐍

Open WebUI can be installed using pip, the Python package installer. Before proceeding, ensure you're using Python 3.11 to avoid compatibility issues.

  1. Install Open WebUI: Open your terminal and run the following command to install Open WebUI:

    pip install open-webui

Running Open WebUI: After installation, you can start Open WebUI by executing:

open-webui serve

This will start the Open WebUI server, which you can access at http://localhost:8080

Quick Start with Docker 🐳

Note

Please note that for certain Docker environments, additional configurations might be needed. If you encounter any connection issues, our detailed guide on Open WebUI Documentation is ready to assist you.

Warning

When using Docker to install Open WebUI, make sure to include the -v open-webui:/app/backend/data in your Docker command. This step is crucial as it ensures your database is properly mounted and prevents any loss of data.

Tip

If you wish to utilize Open WebUI with Ollama included or CUDA acceleration, we recommend utilizing our official images tagged with either :cuda or :ollama. To enable CUDA, you must install the Nvidia CUDA container toolkit on your Linux/WSL system.

Installation with Default Configuration

If Ollama is on a Different Server, use this command:

To connect to Ollama on another server, change the OLLAMA_BASE_URL to the server's URL:

docker run -d -p 3000:8080 -e OLLAMA_BASE_URL=https://example.com -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

To run Open WebUI with Nvidia GPU support, use this command:

docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

Installation for OpenAI API Usage Only

Installing Open WebUI with Bundled Ollama Support

This installation method uses a single container image that bundles Open WebUI with Ollama, allowing for a streamlined setup via a single command. Choose the appropriate command based on your hardware setup:

  • With GPU Support: Utilize GPU resources by running the following command:

    docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

For CPU Only: If you're not using a GPU, use this command instead:

docker run -d -p 3000:8080 -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

Both commands facilitate a built-in, hassle-free installation of both Open WebUI and Ollama, ensuring that you can get everything up and running swiftly.

After installation, you can access Open WebUI at http://localhost:3000. Enjoy! 😄

Other Installation Methods

We offer various installation alternatives, including non-Docker native installation methods, Docker Compose, Kustomize, and Helm. Visit our Open WebUI Documentation or join our Discord community for comprehensive guidance.

Look at the Local Development Guide for instructions on setting up a local development environment.

Troubleshooting

Encountering connection issues? Our Open WebUI Documentation has got you covered. For further assistance and to join our vibrant community, visit the Open WebUI Discord.

Open WebUI: Server Connection Error

If you're experiencing connection issues, it’s often due to the WebUI docker container not being able to reach the Ollama server at 127.0.0.1:11434 (host.docker.internal:11434) inside the container . Use the --network=host flag in your docker command to resolve this. Note that the port changes from 3000 to 8080, resulting in the link: http://localhost:8080.

Example Docker Command:

docker run -d --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Keeping Your Docker Installation Up-to-Date

In case you want to update your local Docker installation to the latest version, you can do it with Watchtower:

docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui

In the last part of the command, replace open-webui with your container name if it is different.

Check our Updating Guide available in our Open WebUI Documentation.

Using the Dev Branch 

Warning

The :dev branch contains the latest unstable features and changes. Use it at your own risk as it may have bugs or incomplete features.

If you want to try out the latest bleeding-edge features and are okay with occasional instability, you can use the :dev tag like this:

docker run -d -p 3000:8080 -v open-webui:/app/backend/data --name open-webui --add-host=host.docker.internal:host-gateway --restart always ghcr.io/open-webui/open-webui:dev

Offline Mode

If you are running Open WebUI in an offline environment, you can set the HF_HUB_OFFLINE environment variable to 1 to prevent attempts to download models from the internet.

export HF_HUB_OFFLINE=1

What's Next? 🌟

Discover upcoming features on our roadmap in the Open WebUI Documentation.
  • If Ollama is on your computer, use this command:

    docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
  • If you're only using OpenAI API, use this command:

    docker run -d -p 3000:8080 -e OPENAI_API_KEY=your_secret_key -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
  •  from https://github.com/open-webui/open-webui

    Fooocus

     Focus on prompting.

    >>> Click Here to Install Fooocus <<<

    Fooocus is an image generating software (based on Gradio ).

    Fooocus presents a rethinking of image generator designs. The software is offline, open source, and free, while at the same time, similar to many online image generators like Midjourney, the manual tweaking is not needed, and users only need to focus on the prompts and images. Fooocus has also simplified the installation: between pressing "download" and generating the first image, the number of needed mouse clicks is strictly limited to less than 3. Minimal GPU memory requirement is 4GB (Nvidia).

    Recently many fake websites exist on Google when you search “fooocus”. Do not trust those – here is the only official source of Fooocus.

    Project Status: Limited Long-Term Support (LTS) with Bug Fixes Only

    The Fooocus project, built entirely on the Stable Diffusion XL architecture, is now in a state of limited long-term support (LTS) with bug fixes only. As the existing functionalities are considered as nearly free of programmartic issues (Thanks to mashb1t's huge efforts), future updates will focus exclusively on addressing any bugs that may arise.

    There are no current plans to migrate to or incorporate newer model architectures. However, this may change during time with the development of open-source community. For example, if the community converge to one single dominant method for image generation (which may really happen in half or one years given the current status), Fooocus may also migrate to that exact method.

    For those interested in utilizing newer models such as Flux, we recommend exploring alternative platforms such as WebUI Forge (also from us), ComfyUI/SwarmUI. Additionally, several excellent forks of Fooocus are available for experimentation.

    Again, recently many fake websites exist on Google when you search “fooocus”. Do NOT get Fooocus from those websites – this page is the only official source of Fooocus. We never have any website like such as “fooocus.com”, “fooocus.net”, “fooocus.co”, “fooocus.ai”, “fooocus.org”, “fooocus.pro”, “fooocus.one”. Those websites are ALL FAKE. They have ABSOLUTLY no relationship to us. Fooocus is a 100% non-commercial offline open-source software.

    Features

    Below is a quick list using Midjourney's examples:

    Midjourney Fooocus
    High-quality text-to-image without needing much prompt engineering or parameter tuning.
    (Unknown method)
    High-quality text-to-image without needing much prompt engineering or parameter tuning.
    (Fooocus has an offline GPT-2 based prompt processing engine and lots of sampling improvements so that results are always beautiful, no matter if your prompt is as short as “house in garden” or as long as 1000 words)
    V1 V2 V3 V4 Input Image -> Upscale or Variation -> Vary (Subtle) / Vary (Strong)
    U1 U2 U3 U4 Input Image -> Upscale or Variation -> Upscale (1.5x) / Upscale (2x)
    Inpaint / Up / Down / Left / Right (Pan) Input Image -> Inpaint or Outpaint -> Inpaint / Up / Down / Left / Right
    (Fooocus uses its own inpaint algorithm and inpaint models so that results are more satisfying than all other software that uses standard SDXL inpaint method/model)
    Image Prompt Input Image -> Image Prompt
    (Fooocus uses its own image prompt algorithm so that result quality and prompt understanding are more satisfying than all other software that uses standard SDXL methods like standard IP-Adapters or Revisions)
    --style Advanced -> Style
    --stylize Advanced -> Advanced -> Guidance
    --niji Multiple launchers: "run.bat", "run_anime.bat", and "run_realistic.bat".
    Fooocus support SDXL models on Civitai
    (You can google search “Civitai” if you do not know about it)
    --quality Advanced -> Quality
    --repeat Advanced -> Image Number
    Multi Prompts (::) Just use multiple lines of prompts
    Prompt Weights You can use " I am (happy:1.5)".
    Fooocus uses A1111's reweighting algorithm so that results are better than ComfyUI if users directly copy prompts from Civitai. (Because if prompts are written in ComfyUI's reweighting, users are less likely to copy prompt texts as they prefer dragging files)
    To use embedding, you can use "(embedding:file_name:1.1)"
    --no Advanced -> Negative Prompt
    --ar Advanced -> Aspect Ratios
    InsightFace Input Image -> Image Prompt -> Advanced -> FaceSwap
    Describe Input Image -> Describe

    Below is a quick list using LeonardoAI's examples:

    LeonardoAI Fooocus
    Prompt Magic Advanced -> Style -> Fooocus V2
    Advanced Sampler Parameters (like Contrast/Sharpness/etc) Advanced -> Advanced -> Sampling Sharpness / etc
    User-friendly ControlNets Input Image -> Image Prompt -> Advanced

    Also, click here to browse the advanced features.

    Download

    Windows

    You can directly download Fooocus with:

    >>> Click here to download <<<

    After you download the file, please uncompress it and then run the "run.bat".

    image

    The first time you launch the software, it will automatically download models:

    1. It will download default models to the folder "Fooocus\models\checkpoints" given different presets. You can download them in advance if you do not want automatic download.
    2. Note that if you use inpaint, at the first time you inpaint an image, it will download Fooocus's own inpaint control model from here as the file "Fooocus\models\inpaint\inpaint_v26.fooocus.patch" (the size of this file is 1.28GB).

    After Fooocus 2.1.60, you will also have run_anime.bat and run_realistic.bat. They are different model presets (and require different models, but they will be automatically downloaded). Check here for more details.

    After Fooocus 2.3.0 you can also switch presets directly in the browser. Keep in mind to add these arguments if you want to change the default behavior:

    • Use --disable-preset-selection to disable preset selection in the browser.
    • Use --always-download-new-model to download missing models on preset switch. Default is fallback to previous_default_models defined in the corresponding preset, also see terminal output.

    image

    If you already have these files, you can copy them to the above locations to speed up installation.

    Note that if you see "MetadataIncompleteBuffer" or "PytorchStreamReader", then your model files are corrupted. Please download models again.

    Below is a test on a relatively low-end laptop with 16GB System RAM and 6GB VRAM (Nvidia 3060 laptop). The speed on this machine is about 1.35 seconds per iteration. Pretty impressive – nowadays laptops with 3060 are usually at very acceptable price.

    image

    Besides, recently many other software report that Nvidia driver above 532 is sometimes 10x slower than Nvidia driver 531. If your generation time is very long, consider download Nvidia Driver 531 Laptop or Nvidia Driver 531 Desktop.

    Note that the minimal requirement is 4GB Nvidia GPU memory (4GB VRAM) and 8GB system memory (8GB RAM). This requires using Microsoft’s Virtual Swap technique, which is automatically enabled by your Windows installation in most cases, so you often do not need to do anything about it. However, if you are not sure, or if you manually turned it off (would anyone really do that?), or if you see any "RuntimeError: CPUAllocator", you can enable it here:

    Click here to see the image instructions.

    image

    And make sure that you have at least 40GB free space on each drive if you still see "RuntimeError: CPUAllocator" !

    Please open an issue if you use similar devices but still cannot achieve acceptable performances.

    Note that the minimal requirement for different platforms is different.

    See also the common problems and troubleshoots here.

    Colab

    (Last tested - 2024 Aug 12 by mashb1t)

    Colab Info
    Open In Colab Fooocus Official

    In Colab, you can modify the last line to !python entry_with_update.py --share --always-high-vram or !python entry_with_update.py --share --always-high-vram --preset anime or !python entry_with_update.py --share --always-high-vram --preset realistic for Fooocus Default/Anime/Realistic Edition.

    You can also change the preset in the UI. Please be aware that this may lead to timeouts after 60 seconds. If this is the case, please wait until the download has finished, change the preset to initial and back to the one you've selected or reload the page.

    Note that this Colab will disable refiner by default because Colab free's resources are relatively limited (and some "big" features like image prompt may cause free-tier Colab to disconnect). We make sure that basic text-to-image is always working on free-tier Colab.

    Using --always-high-vram shifts resource allocation from RAM to VRAM and achieves the overall best balance between performance, flexibility and stability on the default T4 instance. Please find more information here.

    Thanks to camenduru for the template!

    Linux (Using Anaconda)

    If you want to use Anaconda/Miniconda, you can

    git clone https://github.com/lllyasviel/Fooocus.git
    cd Fooocus
    conda env create -f environment.yaml
    conda activate fooocus
    pip install -r requirements_versions.txt
    

    Then download the models: download default models to the folder "Fooocus\models\checkpoints". Or let Fooocus automatically download the models using the launcher:

    conda activate fooocus
    python entry_with_update.py
    

    Or, if you want to open a remote port, use

    conda activate fooocus
    python entry_with_update.py --listen
    

    Use python entry_with_update.py --preset anime or python entry_with_update.py --preset realistic for Fooocus Anime/Realistic Edition.

    Linux (Using Python Venv)

    Your Linux needs to have Python 3.10 installed, and let's say your Python can be called with the command python3 with your venv system working; you can

    git clone https://github.com/lllyasviel/Fooocus.git
    cd Fooocus
    python3 -m venv fooocus_env
    source fooocus_env/bin/activate
    pip install -r requirements_versions.txt
    

    See the above sections for model downloads. You can launch the software with:

    source fooocus_env/bin/activate
    python entry_with_update.py
    

    Or, if you want to open a remote port, use

    source fooocus_env/bin/activate
    python entry_with_update.py --listen
    

    Use python entry_with_update.py --preset anime or python entry_with_update.py --preset realistic for Fooocus Anime/Realistic Edition.

    Linux (Using native system Python)

    If you know what you are doing, and your Linux already has Python 3.10 installed, and your Python can be called with the command python3 (and Pip with pip3), you can

    git clone https://github.com/lllyasviel/Fooocus.git
    cd Fooocus
    pip3 install -r requirements_versions.txt
    

    See the above sections for model downloads. You can launch the software with:

    python3 entry_with_update.py
    

    Or, if you want to open a remote port, use

    python3 entry_with_update.py --listen
    

    Use python entry_with_update.py --preset anime or python entry_with_update.py --preset realistic for Fooocus Anime/Realistic Edition.

    Linux (AMD GPUs)

    Note that the minimal requirement for different platforms is different.

    Same with the above instructions. You need to change torch to the AMD version

    pip uninstall torch torchvision torchaudio torchtext functorch xformers 
    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6
    

    AMD is not intensively tested, however. The AMD support is in beta.

    Use python entry_with_update.py --preset anime or python entry_with_update.py --preset realistic for Fooocus Anime/Realistic Edition.

    Windows (AMD GPUs)

    Note that the minimal requirement for different platforms is different.

    Same with Windows. Download the software and edit the content of run.bat as:

    .\python_embeded\python.exe -m pip uninstall torch torchvision torchaudio torchtext functorch xformers -y
    .\python_embeded\python.exe -m pip install torch-directml
    .\python_embeded\python.exe -s Fooocus\entry_with_update.py --directml
    pause
    

    Then run the run.bat.

    AMD is not intensively tested, however. The AMD support is in beta.

    For AMD, use .\python_embeded\python.exe Fooocus\entry_with_update.py --directml --preset anime or .\python_embeded\python.exe Fooocus\entry_with_update.py --directml --preset realistic for Fooocus Anime/Realistic Edition.

    Mac

    Note that the minimal requirement for different platforms is different.

    Mac is not intensively tested. Below is an unofficial guideline for using Mac. You can discuss problems here.

    You can install Fooocus on Apple Mac silicon (M1 or M2) with macOS 'Catalina' or a newer version. Fooocus runs on Apple silicon computers via PyTorch MPS device acceleration. Mac Silicon computers don't come with a dedicated graphics card, resulting in significantly longer image processing times compared to computers with dedicated graphics cards.

    1. Install the conda package manager and pytorch nightly. Read the Accelerated PyTorch training on Mac Apple Developer guide for instructions. Make sure pytorch recognizes your MPS device.
    2. Open the macOS Terminal app and clone this repository with git clone https://github.com/lllyasviel/Fooocus.git.
    3. Change to the new Fooocus directory, cd Fooocus.
    4. Create a new conda environment, conda env create -f environment.yaml.
    5. Activate your new conda environment, conda activate fooocus.
    6. Install the packages required by Fooocus, pip install -r requirements_versions.txt.
    7. Launch Fooocus by running python entry_with_update.py. (Some Mac M2 users may need python entry_with_update.py --disable-offload-from-vram to speed up model loading/unloading.) The first time you run Fooocus, it will automatically download the Stable Diffusion SDXL models and will take a significant amount of time, depending on your internet connection.

    Use python entry_with_update.py --preset anime or python entry_with_update.py --preset realistic for Fooocus Anime/Realistic Edition.

    Docker

    See docker.md

    Download Previous Version

    See the guidelines here.

    Minimal Requirement

    Below is the minimal requirement for running Fooocus locally. If your device capability is lower than this spec, you may not be able to use Fooocus locally. (Please let us know, in any case, if your device capability is lower but Fooocus still works.)

    Operating System GPU Minimal GPU Memory Minimal System Memory System Swap Note
    Windows/Linux Nvidia RTX 4XXX 4GB 8GB Required fastest
    Windows/Linux Nvidia RTX 3XXX 4GB 8GB Required usually faster than RTX 2XXX
    Windows/Linux Nvidia RTX 2XXX 4GB 8GB Required usually faster than GTX 1XXX
    Windows/Linux Nvidia GTX 1XXX 8GB (* 6GB uncertain) 8GB Required only marginally faster than CPU
    Windows/Linux Nvidia GTX 9XX 8GB 8GB Required faster or slower than CPU
    Windows/Linux Nvidia GTX < 9XX Not supported / / /
    Windows AMD GPU 8GB (updated 2023 Dec 30) 8GB Required via DirectML (* ROCm is on hold), about 3x slower than Nvidia RTX 3XXX
    Linux AMD GPU 8GB 8GB Required via ROCm, about 1.5x slower than Nvidia RTX 3XXX
    Mac M1/M2 MPS Shared Shared Shared about 9x slower than Nvidia RTX 3XXX
    Windows/Linux/Mac only use CPU 0GB 32GB Required about 17x slower than Nvidia RTX 3XXX

    * AMD GPU ROCm (on hold): The AMD is still working on supporting ROCm on Windows.

    * Nvidia GTX 1XXX 6GB uncertain: Some people report 6GB success on GTX 10XX, but some other people report failure cases.

    Note that Fooocus is only for extremely high quality image generating. We will not support smaller models to reduce the requirement and sacrifice result quality.

    Troubleshoot

    See the common problems here.

    Default Models

    Given different goals, the default models and configs of Fooocus are different:

    Task Windows Linux args Main Model Refiner Config
    General run.bat
    juggernautXL_v8Rundiffusion not used here
    Realistic run_realistic.bat --preset realistic realisticStockPhoto_v20 not used here
    Anime run_anime.bat --preset anime animaPencilXL_v500 not used here

    Note that the download is automatic - you do not need to do anything if the internet connection is okay. However, you can download them manually if you (or move them from somewhere else) have your own preparation.

    UI Access and Authentication

    In addition to running on localhost, Fooocus can also expose its UI in two ways:

    • Local UI listener: use --listen (specify port e.g. with --port 8888).
    • API access: use --share (registers an endpoint at .gradio.live).

    In both ways the access is unauthenticated by default. You can add basic authentication by creating a file called auth.json in the main directory, which contains a list of JSON objects with the keys user and pass (see example in auth-example.json).

    List of "Hidden" Tricks

    Click to see a list of tricks. Those are based on SDXL and are not very up-to-date with latest models.
    1. GPT2-based prompt expansion as a dynamic style "Fooocus V2". (similar to Midjourney's hidden pre-processing and "raw" mode, or the LeonardoAI's Prompt Magic).
    2. Native refiner swap inside one single k-sampler. The advantage is that the refiner model can now reuse the base model's momentum (or ODE's history parameters) collected from k-sampling to achieve more coherent sampling. In Automatic1111's high-res fix and ComfyUI's node system, the base model and refiner use two independent k-samplers, which means the momentum is largely wasted, and the sampling continuity is broken. Fooocus uses its own advanced k-diffusion sampling that ensures seamless, native, and continuous swap in a refiner setup. (Update Aug 13: Actually, I discussed this with Automatic1111 several days ago, and it seems that the “native refiner swap inside one single k-sampler” is merged into the dev branch of webui. Great!)
    3. Negative ADM guidance. Because the highest resolution level of XL Base does not have cross attentions, the positive and negative signals for XL's highest resolution level cannot receive enough contrasts during the CFG sampling, causing the results to look a bit plastic or overly smooth in certain cases. Fortunately, since the XL's highest resolution level is still conditioned on image aspect ratios (ADM), we can modify the adm on the positive/negative side to compensate for the lack of CFG contrast in the highest resolution level. (Update Aug 16, the IOS App Draw Things will support Negative ADM Guidance. Great!)
    4. We implemented a carefully tuned variation of Section 5.1 of "Improving Sample Quality of Diffusion Models Using Self-Attention Guidance". The weight is set to very low, but this is Fooocus's final guarantee to make sure that the XL will never yield an overly smooth or plastic appearance (examples here). This can almost eliminate all cases for which XL still occasionally produces overly smooth results, even with negative ADM guidance. (Update 2023 Aug 18, the Gaussian kernel of SAG is changed to an anisotropic kernel for better structure preservation and fewer artifacts.)
    5. We modified the style templates a bit and added the "cinematic-default".
    6. We tested the "sd_xl_offset_example-lora_1.0.safetensors" and it seems that when the lora weight is below 0.5, the results are always better than XL without lora.
    7. The parameters of samplers are carefully tuned.
    8. Because XL uses positional encoding for generation resolution, images generated by several fixed resolutions look a bit better than those from arbitrary resolutions (because the positional encoding is not very good at handling int numbers that are unseen during training). This suggests that the resolutions in UI may be hard coded for best results.
    9. Separated prompts for two different text encoders seem unnecessary. Separated prompts for the base model and refiner may work, but the effects are random, and we refrain from implementing this.
    10. The DPM family seems well-suited for XL since XL sometimes generates overly smooth texture, but the DPM family sometimes generates overly dense detail in texture. Their joint effect looks neutral and appealing to human perception.
    11. A carefully designed system for balancing multiple styles as well as prompt expansion.
    12. Using automatic1111's method to normalize prompt emphasizing. This significantly improves results when users directly copy prompts from civitai.
    13. The joint swap system of the refiner now also supports img2img and upscale in a seamless way.
    14. CFG Scale and TSNR correction (tuned for SDXL) when CFG is bigger than 10.

    Customization

    After the first time you run Fooocus, a config file will be generated at Fooocus\config.txt. This file can be edited to change the model path or default parameters.

    For example, an edited Fooocus\config.txt (this file will be generated after the first launch) may look like this:

    {
        "path_checkpoints": "D:\\Fooocus\\models\\checkpoints",
        "path_loras": "D:\\Fooocus\\models\\loras",
        "path_embeddings": "D:\\Fooocus\\models\\embeddings",
        "path_vae_approx": "D:\\Fooocus\\models\\vae_approx",
        "path_upscale_models": "D:\\Fooocus\\models\\upscale_models",
        "path_inpaint": "D:\\Fooocus\\models\\inpaint",
        "path_controlnet": "D:\\Fooocus\\models\\controlnet",
        "path_clip_vision": "D:\\Fooocus\\models\\clip_vision",
        "path_fooocus_expansion": "D:\\Fooocus\\models\\prompt_expansion\\fooocus_expansion",
        "path_outputs": "D:\\Fooocus\\outputs",
        "default_model": "realisticStockPhoto_v10.safetensors",
        "default_refiner": "",
        "default_loras": [["lora_filename_1.safetensors", 0.5], ["lora_filename_2.safetensors", 0.5]],
        "default_cfg_scale": 3.0,
        "default_sampler": "dpmpp_2m",
        "default_scheduler": "karras",
        "default_negative_prompt": "low quality",
        "default_positive_prompt": "",
        "default_styles": [
            "Fooocus V2",
            "Fooocus Photograph",
            "Fooocus Negative"
        ]
    }

    Many other keys, formats, and examples are in Fooocus\config_modification_tutorial.txt (this file will be generated after the first launch).

    Consider twice before you really change the config. If you find yourself breaking things, just delete Fooocus\config.txt. Fooocus will go back to default.

    A safer way is just to try "run_anime.bat" or "run_realistic.bat" - they should already be good enough for different tasks.

    Note that user_path_config.txt is deprecated and will be removed soon. (Edit: it is already removed.)

    All CMD Flags

    entry_with_update.py  [-h] [--listen [IP]] [--port PORT]
                          [--disable-header-check [ORIGIN]]
                          [--web-upload-size WEB_UPLOAD_SIZE]
                          [--hf-mirror HF_MIRROR]
                          [--external-working-path PATH [PATH ...]]
                          [--output-path OUTPUT_PATH]
                          [--temp-path TEMP_PATH] [--cache-path CACHE_PATH]
                          [--in-browser] [--disable-in-browser]
                          [--gpu-device-id DEVICE_ID]
                          [--async-cuda-allocation | --disable-async-cuda-allocation]
                          [--disable-attention-upcast]
                          [--all-in-fp32 | --all-in-fp16]
                          [--unet-in-bf16 | --unet-in-fp16 | --unet-in-fp8-e4m3fn | --unet-in-fp8-e5m2]
                          [--vae-in-fp16 | --vae-in-fp32 | --vae-in-bf16]
                          [--vae-in-cpu]
                          [--clip-in-fp8-e4m3fn | --clip-in-fp8-e5m2 | --clip-in-fp16 | --clip-in-fp32]
                          [--directml [DIRECTML_DEVICE]]
                          [--disable-ipex-hijack]
                          [--preview-option [none,auto,fast,taesd]]
                          [--attention-split | --attention-quad | --attention-pytorch]
                          [--disable-xformers]
                          [--always-gpu | --always-high-vram | --always-normal-vram | --always-low-vram | --always-no-vram | --always-cpu [CPU_NUM_THREADS]]
                          [--always-offload-from-vram]
                          [--pytorch-deterministic] [--disable-server-log]
                          [--debug-mode] [--is-windows-embedded-python]
                          [--disable-server-info] [--multi-user] [--share]
                          [--preset PRESET] [--disable-preset-selection]
                          [--language LANGUAGE]
                          [--disable-offload-from-vram] [--theme THEME]
                          [--disable-image-log] [--disable-analytics]
                          [--disable-metadata] [--disable-preset-download]
                          [--disable-enhance-output-sorting]
                          [--enable-auto-describe-image]
                          [--always-download-new-model]
                          [--rebuild-hash-cache [CPU_NUM_THREADS]]
    

    Inline Prompt Features

    Wildcards

    Example prompt: __color__ flower

    Processed for positive and negative prompt.

    Selects a random wildcard from a predefined list of options, in this case the wildcards/color.txt file. The wildcard will be replaced with a random color (randomness based on seed). You can also disable randomness and process a wildcard file from top to bottom by enabling the checkbox Read wildcards in order in Developer Debug Mode.

    Wildcards can be nested and combined, and multiple wildcards can be used in the same prompt (example see wildcards/color_flower.txt).

    Array Processing

    Example prompt: [[red, green, blue]] flower

    Processed only for positive prompt.

    Processes the array from left to right, generating a separate image for each element in the array. In this case 3 images would be generated, one for each color. Increase the image number to 3 to generate all 3 variants.

    Arrays can not be nested, but multiple arrays can be used in the same prompt. Does support inline LoRAs as array elements!

    Inline LoRAs

    Example prompt: flower <lora:sunflowers:1.2>

    Processed only for positive prompt.

    Applies a LoRA to the prompt. The LoRA file must be located in the models/loras directory.

    Advanced Features

    Click here to browse the advanced features.

    Forks

    Below are some Forks to Fooocus:

    Fooocus' forks
    fenneishi/Fooocus-Control
    runew0lf/RuinedFooocus
    MoonRide303/Fooocus-MRE
    mashb1t/Fooocus
    and so on ...

    Thanks

    Many thanks to twri and 3Diva and Marc K3nt3L for creating additional SDXL styles available in Fooocus.

    The project starts from a mixture of Stable Diffusion WebUI and ComfyUI codebases.

    Also, thanks daswer123 for contributing the Canvas Zoom!

    Update Log

    The log is here.

    Localization/Translation/I18N

    You can put json files in the language folder to translate the user interface.

    For example, below is the content of Fooocus/language/example.json:

    {
      "Generate": "生成",
      "Input Image": "入力画像",
      "Advanced": "고급",
      "SAI 3D Model": "SAI 3D Modèle"
    }

    If you add --language example arg, Fooocus will read Fooocus/language/example.json to translate the UI.

    For example, you can edit the ending line of Windows run.bat as

    .\python_embeded\python.exe -s Fooocus\entry_with_update.py --language example
    

    Or run_anime.bat as

    .\python_embeded\python.exe -s Fooocus\entry_with_update.py --language example --preset anime
    

    Or run_realistic.bat as

    .\python_embeded\python.exe -s Fooocus\entry_with_update.py --language example --preset realistic
    

    For practical translation, you may create your own file like Fooocus/language/jp.json or Fooocus/language/cn.json and then use flag --language jp or --language cn. Apparently, these files do not exist now. We need your help to create these files!

    Note that if no --language is given and at the same time Fooocus/language/default.json exists, Fooocus will always load Fooocus/language/default.json for translation. By default, the file Fooocus/language/default.json does not exist.

    from  https://github.com/lllyasviel/Fooocus