2025-12-03: 📣 We open-sourced VibeVoice‑Realtime‑0.5B, a real‑time text‑to‑speech model that supports streaming text input and robust long-form speech generation. Try it on Colab.
2025-12-09: 📣 We’ve added experimental speakers
in nine languages (DE, FR, IT, JP, KR, NL, PL, PT, ES) for
exploration—welcome to try them out and share your feedback.
To mitigate deepfake risks and ensure low latency for the
first speech chunk, voice prompts are provided in an embedded format.
For users requiring voice customization, please reach out to our team.
We will also be expanding the range of available speakers.
VibeVoice_Realtime.mp4
VibeVoice_Realtime.mp4
(Launch your own realtime demo via the websocket example in Usage).
2025-09-05: VibeVoice is an open-source research framework
intended to advance collaboration in the speech synthesis community.
After release, we discovered instances where the tool was used in ways
inconsistent with the stated intent. Since responsible use of AI is one
of Microsoft’s guiding principles, we have disabled this repo until we
are confident that out-of-scope use is no longer possible.
Overview
VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker
conversational audio, such as podcasts, from text. It addresses
significant challenges in traditional Text-to-Speech (TTS) systems,
particularly in scalability, speaker consistency, and natural
turn-taking.
VibeVoice currently includes two model variants:
Long-form multi-speaker model: Synthesizes conversational/single-speaker speech up to 90 minutes with up to 4 distinct speakers, surpassing the typical 1–2 speaker limits of many prior models.
Realtime streaming TTS model: Produces initial audible speech in ~300 ms and supports streaming text input for single-speaker real-time speech generation; designed for low-latency generation.
A core innovation of VibeVoice is its use of continuous
speech tokenizers (Acoustic and Semantic) operating at an ultra-low
frame rate of 7.5 Hz. These tokenizers efficiently preserve audio
fidelity while significantly boosting computational efficiency for
processing long sequences. VibeVoice employs a next-token diffusion
framework, leveraging a Large Language Model (LLM) to understand
textual context and dialogue flow, and a diffusion head to generate
high-fidelity acoustic details.
🎵 Demo Examples
Video Demo
We produced this video with Wan2.2. We sincerely appreciate the Wan-Video team for their great work.
Risks and limitations
While
efforts have been made to optimize it through various techniques, it
may still produce outputs that are unexpected, biased, or inaccurate.
VibeVoice inherits any biases, errors, or omissions produced by its base
model (specifically, Qwen2.5 1.5b in this release).
Potential for Deepfakes and Disinformation: High-quality synthetic
speech can be misused to create convincing fake audio content for
impersonation, fraud, or spreading disinformation. Users must ensure
transcripts are reliable, check content accuracy, and avoid using
generated content in misleading ways. Users are expected to use the
generated content and to deploy the models in a lawful manner, in full
compliance with all applicable laws and regulations in the relevant
jurisdictions. It is best practice to disclose the use of AI when
sharing AI-generated content.
English and Chinese only: Transcripts in languages other than English or Chinese may result in unexpected audio outputs.
Non-Speech Audio: The model focuses solely on speech
synthesis and does not handle background noise, music, or other sound
effects.
Overlapping Speech: The current model does not explicitly model or generate overlapping speech segments in conversations.
We do not recommend using VibeVoice in commercial or
real-world applications without further testing and development. This
model is intended for research and development purposes only. Please use
responsibly.
Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners like Ollama and OpenAI-compatible APIs, with built-in inference engine for RAG, making it a powerful AI deployment solution.
🚀 Effortless Setup: Install seamlessly using Docker or Kubernetes (kubectl, kustomize or helm) for a hassle-free experience with support for both :ollama and :cuda tagged images.
🤝 Ollama/OpenAI API Integration:
Effortlessly integrate OpenAI-compatible APIs for versatile
conversations alongside Ollama models. Customize the OpenAI API URL to
link with LMStudio, GroqCloud, Mistral, OpenRouter, and more.
🛡️ Granular Permissions and User Groups:
By allowing administrators to create detailed user roles and
permissions, we ensure a secure user environment. This granularity not
only enhances security but also allows for customized user experiences,
fostering a sense of ownership and responsibility amongst users.
📱 Responsive Design: Enjoy a seamless experience across Desktop PC, Laptop, and Mobile devices.
📱 Progressive Web App (PWA) for Mobile:
Enjoy a native app-like experience on your mobile device with our PWA,
providing offline access on localhost and a seamless user interface.
✒️🔢 Full Markdown and LaTeX Support: Elevate your LLM experience with comprehensive Markdown and LaTeX capabilities for enriched interaction.
🎤📹 Hands-Free Voice/Video Call:
Experience seamless communication with integrated hands-free voice and
video call features using multiple Speech-to-Text providers (Local
Whisper, OpenAI, Deepgram, Azure) and Text-to-Speech engines (Azure,
ElevenLabs, OpenAI, Transformers, WebAPI), allowing for dynamic and
interactive chat environments.
🛠️ Model Builder: Easily create Ollama
models via the Web UI. Create and add custom characters/agents,
customize chat elements, and import models effortlessly through Open WebUI Community integration.
🐍 Native Python Function Calling Tool:
Enhance your LLMs with built-in code editor support in the tools
workspace. Bring Your Own Function (BYOF) by simply adding your pure
Python functions, enabling seamless integration with LLMs.
💾 Persistent Artifact Storage: Built-in
key-value storage API for artifacts, enabling features like journals,
trackers, leaderboards, and collaborative tools with both personal and
shared data scopes across sessions.
📚 Local RAG Integration: Dive into the
future of chat interactions with groundbreaking Retrieval Augmented
Generation (RAG) support using your choice of 9 vector databases and
multiple content extraction engines (Tika, Docling, Document
Intelligence, Mistral OCR, External loaders). Load documents directly
into chat or add files to your document library, effortlessly accessing
them using the # command before a query.
🔍 Web Search for RAG: Perform web searches using 15+ providers including SearXNG, Google PSE, Brave Search, Kagi, Mojeek, Tavily, Perplexity, serpstack, serper, Serply, DuckDuckGo, SearchApi, SerpApi, Bing, Jina, Exa, Sougou, Azure AI Search, and Ollama Cloud, injecting results directly into your chat experience.
🌐 Web Browsing Capability: Seamlessly integrate websites into your chat experience using the #
command followed by a URL. This feature allows you to incorporate web
content directly into your conversations, enhancing the richness and
depth of your interactions.
🎨 Image Generation & Editing Integration:
Create and edit images using multiple engines including OpenAI's
DALL-E, Gemini, ComfyUI (local), and AUTOMATIC1111 (local), with support
for both generation and prompt-based editing workflows.
⚙️ Many Models Conversations:
Effortlessly engage with various models simultaneously, harnessing their
unique strengths for optimal responses. Enhance your experience by
leveraging a diverse set of models in parallel.
🔐 Role-Based Access Control (RBAC):
Ensure secure access with restricted permissions; only authorized
individuals can access your Ollama, and exclusive model creation/pulling
rights are reserved for administrators.
🗄️ Flexible Database & Storage Options:
Choose from SQLite (with optional encryption), PostgreSQL, or configure
cloud storage backends (S3, Google Cloud Storage, Azure Blob Storage)
for scalable deployments.
🔍 Advanced Vector Database Support:
Select from 9 vector database options including ChromaDB, PGVector,
Qdrant, Milvus, Elasticsearch, OpenSearch, Pinecone, S3Vector, and
Oracle 23ai for optimal RAG performance.
🔐 Enterprise Authentication: Full
support for LDAP/Active Directory integration, SCIM 2.0 automated
provisioning, and SSO via trusted headers alongside OAuth providers.
Enterprise-grade user and group provisioning through SCIM 2.0 protocol,
enabling seamless integration with identity providers like Okta, Azure
AD, and Google Workspace for automated user lifecycle management.
☁️ Cloud-Native Integration: Native
support for Google Drive and OneDrive/SharePoint file picking, enabling
seamless document import from enterprise cloud storage.
📊 Production Observability: Built-in
OpenTelemetry support for traces, metrics, and logs, enabling
comprehensive monitoring with your existing observability stack.
⚖️ Horizontal Scalability: Redis-backed session management and WebSocket support for multi-worker and multi-node deployments behind load balancers.
🌐🌍 Multilingual Support: Experience
Open WebUI in your preferred language with our internationalization
(i18n) support. Join us in expanding our supported languages! We're
actively seeking contributors!
🧩 Pipelines, Open WebUI Plugin Support: Seamlessly integrate custom logic and Python libraries into Open WebUI using Pipelines Plugin Framework. Launch your Pipelines instance, set the OpenAI URL to the Pipelines URL, and explore endless possibilities. Examples include Function Calling, User Rate Limiting to control access, Usage Monitoring with tools like Langfuse, Live Translation with LibreTranslate for multilingual support, Toxic Message Filtering and much more.
🌟 Continuous Updates: We are committed to improving Open WebUI with regular updates, fixes, and new features.
Want to learn more about Open WebUI's features? Check out our Open WebUI documentation for a comprehensive overview!
We are incredibly grateful for the generous support of our
sponsors. Their contributions help us to maintain and improve our
project, ensuring we can continue to deliver quality work to our
community. Thank you!
How to Install 🚀
Installation via Python pip 🐍
Open WebUI can be installed using pip, the Python package installer. Before proceeding, ensure you're using Python 3.11 to avoid compatibility issues.
Install Open WebUI:
Open your terminal and run the following command to install Open WebUI:
pip install open-webui
Running Open WebUI:
After installation, you can start Open WebUI by executing:
open-webui serve
This will start the Open WebUI server, which you can access at http://localhost:8080
Quick Start with Docker 🐳
Note
Please
note that for certain Docker environments, additional configurations
might be needed. If you encounter any connection issues, our detailed
guide on Open WebUI Documentation is ready to assist you.
Warning
When using Docker to install Open WebUI, make sure to include the -v open-webui:/app/backend/data in your Docker command. This step is crucial as it ensures your database is properly mounted and prevents any loss of data.
Tip
If
you wish to utilize Open WebUI with Ollama included or CUDA
acceleration, we recommend utilizing our official images tagged with
either :cuda or :ollama. To enable CUDA, you must install the Nvidia CUDA container toolkit on your Linux/WSL system.
Installation with Default Configuration
If Ollama is on a Different Server, use this command:
To connect to Ollama on another server, change the OLLAMA_BASE_URL to the server's URL:
To run Open WebUI with Nvidia GPU support, use this command:
docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda
Installation for OpenAI API Usage Only
Installing Open WebUI with Bundled Ollama Support
This
installation method uses a single container image that bundles Open
WebUI with Ollama, allowing for a streamlined setup via a single
command. Choose the appropriate command based on your hardware setup:
With GPU Support:
Utilize GPU resources by running the following command:
Both commands facilitate a built-in,
hassle-free installation of both Open WebUI and Ollama, ensuring that
you can get everything up and running swiftly.
We
offer various installation alternatives, including non-Docker native
installation methods, Docker Compose, Kustomize, and Helm. Visit our Open WebUI Documentation or join our Discord community for comprehensive guidance.
Look at the Local Development Guide for instructions on setting up a local development environment.
Troubleshooting
Encountering connection issues? Our Open WebUI Documentation has got you covered. For further assistance and to join our vibrant community, visit the Open WebUI Discord.
Open WebUI: Server Connection Error
If
you're experiencing connection issues, it’s often due to the WebUI
docker container not being able to reach the Ollama server at
127.0.0.1:11434 (host.docker.internal:11434) inside the container . Use
the --network=host flag in your docker command to resolve this. Note that the port changes from 3000 to 8080, resulting in the link: http://localhost:8080.
If you are running Open WebUI in an offline environment, you can set the HF_HUB_OFFLINE environment variable to 1 to prevent attempts to download models from the internet.
Fooocus is an image generating software (based on Gradio).
Fooocus presents a rethinking of image generator designs.
The software is offline, open source, and free, while at the same time,
similar to many online image generators like Midjourney, the manual
tweaking is not needed, and users only need to focus on the prompts and
images. Fooocus has also simplified the installation: between pressing
"download" and generating the first image, the number of needed mouse
clicks is strictly limited to less than 3. Minimal GPU memory
requirement is 4GB (Nvidia).
Recently many fake websites exist on Google when
you search “fooocus”. Do not trust those – here is the only official
source of Fooocus.
Project Status: Limited Long-Term Support (LTS) with Bug Fixes Only
The Fooocus project, built entirely on the Stable Diffusion XL
architecture, is now in a state of limited long-term support (LTS) with
bug fixes only. As the existing functionalities are considered as
nearly free of programmartic issues (Thanks to mashb1t's huge efforts), future updates will focus exclusively on addressing any bugs that may arise.
There are no current plans to migrate to or incorporate newer model architectures.
However, this may change during time with the development of
open-source community. For example, if the community converge to one
single dominant method for image generation (which may really happen in
half or one years given the current status), Fooocus may also migrate to
that exact method.
For those interested in utilizing newer models such as Flux, we recommend exploring alternative platforms such as WebUI Forge (also from us), ComfyUI/SwarmUI. Additionally, several excellent forks of Fooocus are available for experimentation.
Again, recently many fake websites exist on Google when you search “fooocus”. Do NOT
get Fooocus from those websites – this page is the only official source
of Fooocus. We never have any website like such as “fooocus.com”,
“fooocus.net”, “fooocus.co”, “fooocus.ai”, “fooocus.org”, “fooocus.pro”,
“fooocus.one”. Those websites are ALL FAKE. They have ABSOLUTLY no relationship to us. Fooocus is a 100% non-commercial offline open-source software.
Features
Below is a quick list using Midjourney's examples:
Midjourney
Fooocus
High-quality text-to-image without needing much prompt engineering or parameter tuning. (Unknown method)
High-quality text-to-image without needing much prompt engineering or parameter tuning.
(Fooocus has an offline GPT-2 based prompt processing engine and lots
of sampling improvements so that results are always beautiful, no matter
if your prompt is as short as “house in garden” or as long as 1000
words)
Input Image -> Inpaint or Outpaint -> Inpaint / Up / Down / Left / Right
(Fooocus uses its own inpaint algorithm and inpaint models so that
results are more satisfying than all other software that uses standard
SDXL inpaint method/model)
Image Prompt
Input Image -> Image Prompt (Fooocus uses its own image
prompt algorithm so that result quality and prompt understanding are
more satisfying than all other software that uses standard SDXL methods
like standard IP-Adapters or Revisions)
You can use " I am (happy:1.5)". Fooocus uses A1111's
reweighting algorithm so that results are better than ComfyUI if users
directly copy prompts from Civitai. (Because if prompts are written in
ComfyUI's reweighting, users are less likely to copy prompt texts as
they prefer dragging files) To use embedding, you can use "(embedding:file_name:1.1)"
After you download the file, please uncompress it and then run the "run.bat".
The first time you launch the software, it will automatically download models:
It will download default models
to the folder "Fooocus\models\checkpoints" given different presets. You
can download them in advance if you do not want automatic download.
Note that if you use inpaint, at the first time you inpaint an image, it will download Fooocus's own inpaint control model from here as the file "Fooocus\models\inpaint\inpaint_v26.fooocus.patch" (the size of this file is 1.28GB).
After Fooocus 2.1.60, you will also have run_anime.bat and run_realistic.bat. They are different model presets (and require different models, but they will be automatically downloaded). Check here for more details.
After Fooocus 2.3.0 you can also switch presets directly
in the browser. Keep in mind to add these arguments if you want to
change the default behavior:
Use --disable-preset-selection to disable preset selection in the browser.
Use --always-download-new-model to download missing models on preset switch. Default is fallback to previous_default_models defined in the corresponding preset, also see terminal output.
If you already have these files, you can copy them to the above locations to speed up installation.
Note that if you see "MetadataIncompleteBuffer" or "PytorchStreamReader", then your model files are corrupted. Please download models again.
Below is a test on a relatively low-end laptop with 16GB System RAM and 6GB VRAM
(Nvidia 3060 laptop). The speed on this machine is about 1.35 seconds
per iteration. Pretty impressive – nowadays laptops with 3060 are
usually at very acceptable price.
Besides, recently many other software report that Nvidia
driver above 532 is sometimes 10x slower than Nvidia driver 531. If your
generation time is very long, consider download Nvidia Driver 531 Laptop or Nvidia Driver 531 Desktop.
Note that the minimal requirement is 4GB Nvidia GPU memory (4GB VRAM) and 8GB system memory (8GB RAM).
This requires using Microsoft’s Virtual Swap technique, which is
automatically enabled by your Windows installation in most cases, so you
often do not need to do anything about it. However, if you are not
sure, or if you manually turned it off (would anyone really do that?),
or if you see any "RuntimeError: CPUAllocator", you can enable it here:
Click here to see the image instructions.
And make sure that you have at least 40GB free space on each drive if you still see "RuntimeError: CPUAllocator" !
Please open an issue if you use similar devices but still cannot achieve acceptable performances.
In Colab, you can modify the last line to !python entry_with_update.py --share --always-high-vram or !python entry_with_update.py --share --always-high-vram --preset anime or !python entry_with_update.py --share --always-high-vram --preset realistic for Fooocus Default/Anime/Realistic Edition.
You can also change the preset in the UI. Please be aware
that this may lead to timeouts after 60 seconds. If this is the case,
please wait until the download has finished, change the preset to
initial and back to the one you've selected or reload the page.
Note that this Colab will disable refiner by default
because Colab free's resources are relatively limited (and some "big"
features like image prompt may cause free-tier Colab to disconnect). We
make sure that basic text-to-image is always working on free-tier Colab.
Using --always-high-vram shifts resource
allocation from RAM to VRAM and achieves the overall best balance
between performance, flexibility and stability on the default T4
instance. Please find more information here.
Then download the models: download default models to the folder "Fooocus\models\checkpoints". Or let Fooocus automatically download the models using the launcher:
Use python entry_with_update.py --preset anime or python entry_with_update.py --preset realistic for Fooocus Anime/Realistic Edition.
Linux (Using Python Venv)
Your Linux needs to have Python 3.10 installed, and let's say your Python can be called with the command python3 with your venv system working; you can
Use python entry_with_update.py --preset anime or python entry_with_update.py --preset realistic for Fooocus Anime/Realistic Edition.
Linux (Using native system Python)
If you know what you are doing, and your Linux already has Python 3.10 installed, and your Python can be called with the command python3 (and Pip with pip3), you can
git clone https://github.com/lllyasviel/Fooocus.git
cd Fooocus
pip3 install -r requirements_versions.txt
See the above sections for model downloads. You can launch the software with:
python3 entry_with_update.py
Or, if you want to open a remote port, use
python3 entry_with_update.py --listen
Use python entry_with_update.py --preset anime or python entry_with_update.py --preset realistic for Fooocus Anime/Realistic Edition.
AMD is not intensively tested, however. The AMD support is in beta.
For AMD, use .\python_embeded\python.exe Fooocus\entry_with_update.py --directml --preset anime or .\python_embeded\python.exe Fooocus\entry_with_update.py --directml --preset realistic for Fooocus Anime/Realistic Edition.
Mac is not intensively tested. Below is an unofficial guideline for using Mac. You can discuss problems here.
You can install Fooocus on Apple Mac silicon (M1 or M2)
with macOS 'Catalina' or a newer version. Fooocus runs on Apple silicon
computers via PyTorch
MPS device acceleration. Mac Silicon computers don't come with a
dedicated graphics card, resulting in significantly longer image
processing times compared to computers with dedicated graphics cards.
Install the conda package manager and pytorch nightly. Read the Accelerated PyTorch training on Mac Apple Developer guide for instructions. Make sure pytorch recognizes your MPS device.
Open the macOS Terminal app and clone this repository with git clone https://github.com/lllyasviel/Fooocus.git.
Change to the new Fooocus directory, cd Fooocus.
Create a new conda environment, conda env create -f environment.yaml.
Activate your new conda environment, conda activate fooocus.
Install the packages required by Fooocus, pip install -r requirements_versions.txt.
Launch Fooocus by running python entry_with_update.py. (Some Mac M2 users may need python entry_with_update.py --disable-offload-from-vram
to speed up model loading/unloading.) The first time you run Fooocus,
it will automatically download the Stable Diffusion SDXL models and will
take a significant amount of time, depending on your internet
connection.
Use python entry_with_update.py --preset anime or python entry_with_update.py --preset realistic for Fooocus Anime/Realistic Edition.
Below
is the minimal requirement for running Fooocus locally. If your device
capability is lower than this spec, you may not be able to use Fooocus
locally. (Please let us know, in any case, if your device capability is
lower but Fooocus still works.)
via DirectML (* ROCm is on hold), about 3x slower than Nvidia RTX 3XXX
Linux
AMD GPU
8GB
8GB
Required
via ROCm, about 1.5x slower than Nvidia RTX 3XXX
Mac
M1/M2 MPS
Shared
Shared
Shared
about 9x slower than Nvidia RTX 3XXX
Windows/Linux/Mac
only use CPU
0GB
32GB
Required
about 17x slower than Nvidia RTX 3XXX
* AMD GPU ROCm (on hold): The AMD is still working on supporting ROCm on Windows.
* Nvidia GTX 1XXX 6GB uncertain: Some people report 6GB success on GTX 10XX, but some other people report failure cases.
Note that Fooocus is only for extremely high quality
image generating. We will not support smaller models to reduce the
requirement and sacrifice result quality.
Note that the download is automatic - you
do not need to do anything if the internet connection is okay. However,
you can download them manually if you (or move them from somewhere
else) have your own preparation.
UI Access and Authentication
In addition to running on localhost, Fooocus can also expose its UI in two ways:
Local UI listener: use --listen (specify port e.g. with --port 8888).
API access: use --share (registers an endpoint at .gradio.live).
In both ways the access is unauthenticated by default. You can add basic authentication by creating a file called auth.json in the main directory, which contains a list of JSON objects with the keys user and pass (see example in auth-example.json).
List of "Hidden" Tricks
Click to see a list of tricks. Those are based on SDXL and are not very up-to-date with latest models.
Native refiner swap inside one single k-sampler. The advantage is
that the refiner model can now reuse the base model's momentum (or ODE's
history parameters) collected from k-sampling to achieve more coherent
sampling. In Automatic1111's high-res fix and ComfyUI's node system, the
base model and refiner use two independent k-samplers, which means the
momentum is largely wasted, and the sampling continuity is broken.
Fooocus uses its own advanced k-diffusion sampling that ensures
seamless, native, and continuous swap in a refiner setup. (Update Aug
13: Actually, I discussed this with Automatic1111 several days ago, and
it seems that the “native refiner swap inside one single k-sampler” is merged into the dev branch of webui. Great!)
Negative ADM guidance. Because the highest resolution level of XL
Base does not have cross attentions, the positive and negative signals
for XL's highest resolution level cannot receive enough contrasts during
the CFG sampling, causing the results to look a bit plastic or overly
smooth in certain cases. Fortunately, since the XL's highest resolution
level is still conditioned on image aspect ratios (ADM), we can modify
the adm on the positive/negative side to compensate for the lack of CFG
contrast in the highest resolution level. (Update Aug 16, the IOS App Draw Things will support Negative ADM Guidance. Great!)
We implemented a carefully tuned variation of Section 5.1 of "Improving Sample Quality of Diffusion Models Using Self-Attention Guidance".
The weight is set to very low, but this is Fooocus's final guarantee to
make sure that the XL will never yield an overly smooth or plastic
appearance (examples here).
This can almost eliminate all cases for which XL still occasionally
produces overly smooth results, even with negative ADM guidance. (Update
2023 Aug 18, the Gaussian kernel of SAG is changed to an anisotropic
kernel for better structure preservation and fewer artifacts.)
We modified the style templates a bit and added the "cinematic-default".
We tested the "sd_xl_offset_example-lora_1.0.safetensors" and it
seems that when the lora weight is below 0.5, the results are always
better than XL without lora.
The parameters of samplers are carefully tuned.
Because XL uses positional encoding for generation resolution,
images generated by several fixed resolutions look a bit better than
those from arbitrary resolutions (because the positional encoding is not
very good at handling int numbers that are unseen during training).
This suggests that the resolutions in UI may be hard coded for best
results.
Separated prompts for two different text encoders seem unnecessary.
Separated prompts for the base model and refiner may work, but the
effects are random, and we refrain from implementing this.
The DPM family seems well-suited for XL since XL sometimes generates
overly smooth texture, but the DPM family sometimes generates overly
dense detail in texture. Their joint effect looks neutral and appealing
to human perception.
A carefully designed system for balancing multiple styles as well as prompt expansion.
Using automatic1111's method to normalize prompt emphasizing. This
significantly improves results when users directly copy prompts from
civitai.
The joint swap system of the refiner now also supports img2img and upscale in a seamless way.
CFG Scale and TSNR correction (tuned for SDXL) when CFG is bigger than 10.
Customization
After the first time you run Fooocus, a config file will be generated at Fooocus\config.txt. This file can be edited to change the model path or default parameters.
For example, an edited Fooocus\config.txt (this file will be generated after the first launch) may look like this:
Many other keys, formats, and examples are in Fooocus\config_modification_tutorial.txt (this file will be generated after the first launch).
Consider twice before you really change the config. If you find yourself breaking things, just delete Fooocus\config.txt. Fooocus will go back to default.
A safer way is just to try "run_anime.bat" or "run_realistic.bat" - they should already be good enough for different tasks.
Note that user_path_config.txt is deprecated and will be removed soon. (Edit: it is already removed.)
Selects a random wildcard from a predefined list of options, in this case the wildcards/color.txt file.
The wildcard will be replaced with a random color (randomness based on seed).
You can also disable randomness and process a wildcard file from top to bottom by enabling the checkbox Read wildcards in order in Developer Debug Mode.
Wildcards can be nested and combined, and multiple wildcards can be used in the same prompt (example see wildcards/color_flower.txt).
Array Processing
Example prompt: [[red, green, blue]] flower
Processed only for positive prompt.
Processes the array from left to right, generating a
separate image for each element in the array. In this case 3 images
would be generated, one for each color.
Increase the image number to 3 to generate all 3 variants.
Arrays can not be nested, but multiple arrays can be used in the same prompt.
Does support inline LoRAs as array elements!
Inline LoRAs
Example prompt: flower <lora:sunflowers:1.2>
Processed only for positive prompt.
Applies a LoRA to the prompt. The LoRA file must be located in the models/loras directory.
You can put json files in the language folder to translate the user interface.
For example, below is the content of Fooocus/language/example.json:
{
"Generate": "生成",
"Input Image": "入力画像",
"Advanced": "고급",
"SAI 3D Model": "SAI 3D Modèle"
}
If you add --language example arg, Fooocus will read Fooocus/language/example.json to translate the UI.
For example, you can edit the ending line of Windows run.bat as
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --language example
Or run_anime.bat as
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --language example --preset anime
Or run_realistic.bat as
.\python_embeded\python.exe -s Fooocus\entry_with_update.py --language example --preset realistic
For practical translation, you may create your own file like Fooocus/language/jp.json or Fooocus/language/cn.json and then use flag --language jp or --language cn. Apparently, these files do not exist now. We need your help to create these files!
Note that if no --language is given and at the same time Fooocus/language/default.json exists, Fooocus will always load Fooocus/language/default.json for translation. By default, the file Fooocus/language/default.json does not exist.