Total Pageviews

Wednesday, 25 February 2026

lobste.rs

https://lobste.rs ,国外的一个it技术论坛

漫步人生路

 

基于go的静态博客生成器gozer

 Fast, opinionated and simple static site generator.

Gozer is a fast & simple static site generator written in Golang.

  • Converts Markdown and djot to HTML.
  • Allows you to use page-specific templates.
  • Creates an XML sitemap for search engines.
  • Creates an RSS feed for feed readers.

Sample websites using Gozer:

Installation

You can install Gozer by first installing a Go compiler and then running:

go install github.com/dannyvankooten/gozer

Usage

Run gozer new to quickly generate an empty directory structure.

├── config.toml                # Configuration file
├── content                    # Posts and pages
│   └── index.md
├── public                     # Static files
└── templates                  # Template files
    └── default.html

Then, run gozer build to generate your site.

Any Markdown files placed in your content/ directory will result in an HTML page in your build directory after running gozer build.

For example:

  • content/index.md creates a file build/index.html so it is accessible over HTTP at /
  • content/about.md creates a file build/about/index.html so it is accessible over HTTP at /about/.

Commands

Run gozer without any arguments to view the help text.

Gozer - a fast & simple static site generator

Usage: gozer [OPTIONS] <COMMAND>

Commands:
    build   Deletes the output directory if there is one and builds the site
    serve   Builds the site and starts an HTTP server on http://localhost:8080
    watch   Builds the site and watches for file changes
    new     Creates a new site structure in the given directory

Options:
    -r, --root <ROOT> Directory to use as root of project (default: .)
    -c, --config <CONFIG> Path to configuration file (default: config.toml)
        --listen <INTERFACE:PORT> Interface to liston on; only used with 'serve',
                 'INTERFACE' is optional. e.g. '--listen :9000'

Content files

Each file in your content/ directory should end in .md or .dj and have TOML front matter specifying the page title:

+++
title = "My page title"
+++

Page content here.

djot note djot has not settled on a syntax for front matter. Until issue #35 is resolved, TOML front matter in djot documents are used.

Templates

The default template for every page is default.html. You can override it by setting the template variable in your front matter.

+++
title = "My page title"
template = "special-page.html"
+++

Page content here.

Templates are powered by Go's standard html/template package, so you can use all the actions described here.

Every template receives the following set of variables:

Pages       # Slice of all pages in the site
Posts       # Slice of all posts in the site (any page with a date in the filename)
Site        # Global site properties: Url, Title
Page        # The current page: Title, Permalink, UrlPath, DatePublished, DateModified
Title       # The current page title, shorthand for Page.Title
Content     # The current page's HTML content.
Now         # Timestamp of build, instance of time.Time

The Page variable is an instance of the object below:

type Page struct {
    // Title of this page
    Title         string

    // Template this page uses for rendering. Defaults to "default.html".
    Template      string

    // Time this page was published (parsed from file name).
    DatePublished time.Time

    // Time this page was last modified on the filesystem.
    DateModified  time.Time

    // The full URL to this page, including the site URL.
    Permalink     string

    // URL path for this page, relative to site URL
    UrlPath       string

    // Path to source file for this page, relative to content root
    Filepath      string
}

To show a list of the 5 most recent posts:

{{ range (slice .Posts 0 5) }}
    <a href="{{ .Permalink }}">{{ .Title }}</a> <small>{{ .DatePublished.Format "Jan 02, 2006" }}</small><br />
{{ end }} 
from  https://github.com/,/gozer
--------------------------------------------------------
搭建基于go的静态博客生成器gozer
首先安装go环境,然后 go install github.com/dannyvankooten/gozer@latest
运行此命令后,就会生成可执行文件gozer.
然后fork此项目 https://github.com/dannyvankooten/www.dannyvankooten.com,我fork后的项目地址是
 https://github.com/briteming/wdc。
git clone https://github.com/dannyvankooten/www.dannyvankooten.com wdc
cd wdc
 12799@DESKTOP-B6LK9IO MINGW64 ~/wdc (main)
$ ls
LICENSE bin/ config.toml content/ templates/
README.md config_prod.toml public/

12799@DESKTOP-B6LK9IO MINGW64 ~/wdc (main)
$ gozer build
(此即生成/更新静态网站的根目录的命令) 
12799@DESKTOP-B6LK9IO MINGW64 ~/wdc (main)
$ ls
LICENSE bin/ config.toml content/ templates/
README.md build/ config_prod.toml public/
(生成了build目录)

12799@DESKTOP-B6LK9IO MINGW64 ~/wdc (main)
$ cd build
12799@DESKTOP-B6LK9IO MINGW64 ~/wdc/build (main)
$ ls
2023/ code/ img/ privacy-policy/ sitemap.xsl
2025/ contact/ index.html projects/ style.css
404/ donate/ links/ public-key.txt wordpress-plugins/
about/ favicon.ico media/ robots.txt
blog/ feed.xml notebooks/ rss-icon.svg
bookmarks/ hire-me/ now/ sitemap.xml
(可见~/wdc/build就是静态网站的根目录)
12799@DESKTOP-B6LK9IO MINGW64 ~/wdc/build (main)
$
新建源帖:
12799@DESKTOP-B6LK9IO MINGW64 ~/wdc/build (main)
$ cd ../content

12799@DESKTOP-B6LK9IO MINGW64 ~/wdc/content (main)
$ ls
404.md bookmarks.md donate.md links.md projects.md
about.md code.md hire-me.md now.md wordpress-plugins.md
blog/ contact.md index.md privacy-policy.md

12799@DESKTOP-B6LK9IO MINGW64 ~/wdc/content (main)
$ cd blog

12799@DESKTOP-B6LK9IO MINGW64 ~/wdc/content/blog (main)
$ ls
2010/ 2012/ 2014/ 2016/ 2018/ 2020/ 2022/ 2024/ index.md
2011/ 2013/ 2015/ 2017/ 2019/ 2021/ 2023/ 2025/

12799@DESKTOP-B6LK9IO MINGW64 ~/wdc/content/blog (main)
$ mkdir 2026 && cd 2026
12799@DESKTOP-B6LK9IO MINGW64 ~/wdc/content/blog/2026 (main)
$ nano 2026-02-24-fh.md
12799@DESKTOP-B6LK9IO MINGW64 ~/wdc/content/blog/2026 (main)
$ cat 2026-02-24-fh.md
显示:
+++
title = "战马"
+++
此处为html codes或正文
12799@DESKTOP-B6LK9IO MINGW64 ~/wdc/content/blog/2026 (main)
$ cd ~/wdc/

12799@DESKTOP-B6LK9IO MINGW64 ~/wdc (main)
$ ls
LICENSE bin/ config.toml content/ templates/
README.md build/ config_prod.toml public/

12799@DESKTOP-B6LK9IO MINGW64 ~/wdc (main)
$ gozer build
12799@DESKTOP-B6LK9IO MINGW64 ~/wdc (main)
$ cd build

12799@DESKTOP-B6LK9IO MINGW64 ~/wdc/build (main)
$ ls
2023/ code/ img/ privacy-policy/ sitemap.xsl
2025/ contact/ index.html projects/ style.css
404/ donate/ links/ public-key.txt wordpress-plugins/
about/ favicon.ico media/ robots.txt
blog/ feed.xml notebooks/ rss-icon.svg
bookmarks/ hire-me/ now/ sitemap.xml

12799@DESKTOP-B6LK9IO MINGW64 ~/wdc/build (main)
$ python3 -m http.server 2000
在浏览器里,访问http://localhost:2000/,即可看到静态网站的效果。如图:

 访问https://app.netlify.com/drop,然后在电脑上,进入C:\Users\你的用户名\wdc目录,把build目录拖放到

页面https://app.netlify.com/drop里的圆圈里,等待上传完成,上传完成后,
我得到了网址https://jazzy-scone-580b6a.netlify.app/
 https://jazzy-scone-580b6a.netlify.app/blog/。同一天里的帖子是按字母顺序从下到上排列的:
 https://jazzy-scone-580b6a.netlify.app/blog/2026/test/
 https://jazzy-scone-580b6a.netlify.app/blog/2026/fh/
https://jazzy-scone-580b6a.netlify.app/blog/2026/ce/

 


 





 
 

Tuesday, 24 February 2026

awesome-selfhosted-data

 

machine-readable data for https://awesome-selfhosted.net

awesome-selfhosted.net   

This repository holds data used to generate https://awesome-selfhosted.net and https://github.com/awesome-selfhosted/awesome-selfhosted

from  https://github.com/awesome-selfhosted/awesome-selfhosted-data

llm-course

 

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

  Hugging Face • 💻 Blog • 📙 LLM Engineer's Handbook

LLM Engineer's Handbook CoverThe LLM course is divided into three parts:

  1. 🧩 LLM Fundamentals is optional and covers fundamental knowledge about mathematics, Python, and neural networks.
  2. 🧑‍🔬 The LLM Scientist focuses on building the best possible LLMs using the latest techniques.
  3. 👷 The LLM Engineer focuses on creating LLM-based applications and deploying them.

Note

Based on this course, I co-wrote the LLM Engineer's Handbook, a hands-on book that covers an end-to-end LLM application from design to deployment. The LLM course will always stay free, but you can support my work by purchasing this book.

For a more comprehensive version of this course, check out the DeepWiki.

📝 Notebooks

A list of notebooks and articles I wrote about LLMs.

Toggle section (optional)






Open In Colab


Open In Colab


Open In Colab


Open In Colab


Open In Colab


Open In Colab


Open In Colab


Open In Colab








Open In Colab



Open In Colab



Open In Colab



Open In Colab



Open In Colab



Open In Colab








Open In Colab



Open In Colab



Open In Colab



Open In Colab








Open In Colab



Open In Colab



Open In Colab



Open In Colab



Open In Colab

🧩 LLM Fundamentals

This section introduces essential knowledge about mathematics, Python, and neural networks. You might not want to start here but refer to it as needed.

Toggle section (optional)









































🧑‍🔬 The LLM Scientist

This section of the course focuses on learning how to build the best possible LLMs using the latest techniques.

1. The LLM Architecture

An in-depth knowledge of the Transformer architecture is not required, but it's important to understand the main steps of modern LLMs: converting text into numbers through tokenization, processing these tokens through layers including attention mechanisms, and finally generating new text through various sampling strategies.

  • Architectural overview: Understand the evolution from encoder-decoder Transformers to decoder-only architectures like GPT, which form the basis of modern LLMs. Focus on how these models process and generate text at a high level.
  • Tokenization: Learn the principles of tokenization - how text is converted into numerical representations that LLMs can process. Explore different tokenization strategies and their impact on model performance and output quality.
  • Attention mechanisms: Master the core concepts of attention mechanisms, particularly self-attention and its variants. Understand how these mechanisms enable LLMs to process long-range dependencies and maintain context throughout sequences.
  • Sampling techniques: Explore various text generation approaches and their tradeoffs. Compare deterministic methods like greedy search and beam search with probabilistic approaches like temperature sampling and nucleus sampling.

📚 References:

  • Visual intro to Transformers by 3Blue1Brown: Visual introduction to Transformers for complete beginners.
  • LLM Visualization by Brendan Bycroft: Interactive 3D visualization of LLM internals.
  • nanoGPT by Andrej Karpathy: A 2h-long YouTube video to reimplement GPT from scratch (for programmers). He also made a video about tokenization.
  • Attention? Attention! by Lilian Weng: Historical overview to introduce the need for attention mechanisms.
  • Decoding Strategies in LLMs by Maxime Labonne: Provide code and a visual introduction to the different decoding strategies to generate text.

2. Pre-Training Models

Pre-training is a computationally intensive and expensive process. While it's not the focus of this course, it's important to have a solid understanding of how models are pre-trained, especially in terms of data and parameters. Pre-training can also be performed by hobbyists at a small scale with <1B models.

  • Data preparation: Pre-training requires massive datasets (e.g., Llama 3.1 was trained on 15 trillion tokens) that need careful curation, cleaning, deduplication, and tokenization. Modern pre-training pipelines implement sophisticated filtering to remove low-quality or problematic content.
  • Distributed training: Combine different parallelization strategies: data parallel (batch distribution), pipeline parallel (layer distribution), and tensor parallel (operation splitting). These strategies require optimized network communication and memory management across GPU clusters.
  • Training optimization: Use adaptive learning rates with warm-up, gradient clipping, and normalization to prevent explosions, mixed-precision training for memory efficiency, and modern optimizers (AdamW, Lion) with tuned hyperparameters.
  • Monitoring: Track key metrics (loss, gradients, GPU stats) using dashboards, implement targeted logging for distributed training issues, and set up performance profiling to identify bottlenecks in computation and communication across devices.

📚 References:

  • FineWeb by Penedo et al.: Article to recreate a large-scale dataset for LLM pretraining (15T), including FineWeb-Edu, a high-quality subset.
  • RedPajama v2 by Weber et al.: Another article and paper about a large-scale pre-training dataset with a lot of interesting quality filters.
  • nanotron by Hugging Face: Minimalistic LLM training codebase used to make SmolLM2.
  • Parallel training by Chenyan Xiong: Overview of optimization and parallelism techniques.
  • Distributed training by Duan et al.: A survey about efficient training of LLM on distributed architectures.
  • OLMo 2 by AI2: Open-source language model with model, data, training, and evaluation code.
  • LLM360 by LLM360: A framework for open-source LLMs with training and data preparation code, data, metrics, and models.

3. Post-Training Datasets

Post-training datasets have a precise structure with instructions and answers (supervised fine-tuning) or instructions and chosen/rejected answers (preference alignment). Conversational structures are a lot rarer than the raw text used for pre-training, which is why we often need to process seed data and refine it to improve the accuracy, diversity, and complexity of the samples. More information and examples are available in my repo 💾 LLM Datasets.

  • Storage & chat templates: Because of the conversational structure, post-training datasets are stored in a specific format like ShareGPT or OpenAI/HF. Then, these formats are mapped to a chat template like ChatML or Alpaca to produce the final samples that the model is trained on.
  • Synthetic data generation: Create instruction-response pairs based on seed data using frontier models like GPT-4o. This approach allows for flexible and scalable dataset creation with high-quality answers. Key considerations include designing diverse seed tasks and effective system prompts.
  • Data enhancement: Enhance existing samples using techniques like verified outputs (using unit tests or solvers), multiple answers with rejection sampling, Auto-Evol, Chain-of-Thought, Branch-Solve-Merge, personas, etc.
  • Quality filtering: Traditional techniques involve rule-based filtering, removing duplicates or near-duplicates (with MinHash or embeddings), and n-gram decontamination. Reward models and judge LLMs complement this step with fine-grained and customizable quality control.

📚 References:

  • Synthetic Data Generator by Argilla: Beginner-friendly way of building datasets using natural language in a Hugging Face space.
  • LLM Datasets by Maxime Labonne: Curated list of datasets and tools for post-training.
  • NeMo-Curator by Nvidia: Dataset preparation and curation framework for pre- and post-training data.
  • Distilabel by Argilla: Framework to generate synthetic data. It also includes interesting reproductions of papers like UltraFeedback.
  • Semhash by MinishLab: Minimalistic library for near-deduplication and decontamination with a distilled embedding model.
  • Chat Template by Hugging Face: Hugging Face's documentation about chat templates.

4. Supervised Fine-Tuning

SFT turns base models into helpful assistants, capable of answering questions and following instructions. During this process, they learn how to structure answers and reactivate a subset of knowledge learned during pre-training. Instilling new knowledge is possible but superficial: it cannot be used to learn a completely new language. Always prioritize data quality over parameter optimization.

  • Training techniques: Full fine-tuning updates all model parameters but requires significant compute. Parameter-efficient fine-tuning techniques like LoRA and QLoRA reduce memory requirements by training a small number of adapter parameters while keeping base weights frozen. QLoRA combines 4-bit quantization with LoRA to reduce VRAM usage. These techniques are all implemented in the most popular fine-tuning frameworks: TRL, Unsloth, and Axolotl.
  • Training parameters: Key parameters include learning rate with schedulers, batch size, gradient accumulation, number of epochs, optimizer (like 8-bit AdamW), weight decay for regularization, and warmup steps for training stability. LoRA also adds three parameters: rank (typically 16-128), alpha (1-2x rank), and target modules.
  • Distributed training: Scale training across multiple GPUs using DeepSpeed or FSDP. DeepSpeed provides three ZeRO optimization stages with increasing levels of memory efficiency through state partitioning. Both methods support gradient checkpointing for memory efficiency.
  • Monitoring: Track training metrics including loss curves, learning rate schedules, and gradient norms. Monitor for common issues like loss spikes, gradient explosions, or performance degradation.

📚 References:

  • Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth by Maxime Labonne: Hands-on tutorial on how to fine-tune a Llama 3.1 model using Unsloth.
  • Axolotl - Documentation by Wing Lian: Lots of interesting information related to distributed training and dataset formats.
  • Mastering LLMs by Hamel Husain: Collection of educational resources about fine-tuning (but also RAG, evaluation, applications, and prompt engineering).
  • LoRA insights by Sebastian Raschka: Practical insights about LoRA and how to select the best parameters.

5. Preference Alignment

Preference alignment is a second stage in the post-training pipeline, focused on aligning generated answers with human preferences. This stage was designed to tune the tone of LLMs and reduce toxicity and hallucinations. However, it has become increasingly important to also boost their performance and improve their usefulness. Unlike SFT, there are many preference alignment algorithms. Here, we'll focus on the three most important ones: DPO, GRPO, and PPO.

  • Rejection sampling: For each prompt, use the trained model to generate multiple responses, and score them to infer chosen/rejected answers. This creates on-policy data, where both responses come from the model being trained, improving alignment stability.
  • Direct Preference Optimization Directly optimizes the policy to maximize the likelihood of chosen responses over rejected ones. It doesn't require reward modeling, which makes it more computationally efficient than RL techniques but slightly worse in terms of quality. Great for creating chat models.
  • Reward model: Train a reward model with human feedback to predict metrics like human preferences. It can leverage frameworks like TRL, verl, and OpenRLHF for scalable training.
  • Reinforcement Learning: RL techniques like GRPO and PPO iteratively update a policy to maximize rewards while staying close to the initial behavior. They can use a reward model or reward functions to score responses. They tend to be computationally expensive and require careful tuning of hyperparameters, including learning rate, batch size, and clip range. Ideal for creating reasoning models.

📚 References:


6. Evaluation

Reliably evaluating LLMs is a complex but essential task guiding data generation and training. It provides invaluable feedback about areas of improvement, which can be leveraged to modify the data mixture, quality, and training parameters. However, it's always good to remember Goodhart's law: "When a measure becomes a target, it ceases to be a good measure."

  • Automated benchmarks: Evaluate models on specific tasks using curated datasets and metrics, like MMLU. It works well for concrete tasks but struggles with abstract and creative capabilities. It is also prone to data contamination.
  • Human evaluation: It involves humans prompting models and grading responses. Methods range from vibe checks to systematic annotations with specific guidelines and large-scale community voting (arena). It is more suited for subjective tasks and less reliable for factual accuracy.
  • Model-based evaluation: Use judge and reward models to evaluate model outputs. It highly correlates with human preferences but suffers from bias toward their own outputs and inconsistent scoring.
  • Feedback signal: Analyze error patterns to identify specific weaknesses, such as limitations in following complex instructions, lack of specific knowledge, or susceptibility to adversarial prompts. This can be improved with better data generation and training parameters.

📚 References:

  • LLM evaluation guidebook by Hugging Face: Comprehensive guide about evaluation with practical insights.
  • Open LLM Leaderboard by Hugging Face: Main leaderboard to compare LLMs in an open and reproducible way (automated benchmarks).
  • Language Model Evaluation Harness by EleutherAI: A popular framework for evaluating LLMs using automated benchmarks.
  • Lighteval by Hugging Face: Alternative evaluation framework that also includes model-based evaluations.
  • Chatbot Arena by LMSYS: Elo rating of general-purpose LLMs, based on comparisons made by humans (human evaluation).

7. Quantization

Quantization is the process of converting the parameters and activations of a model to a lower precision. For example, weights stored using 16 bits can be converted into a 4-bit representation. This technique has become increasingly important to reduce the computational and memory costs associated with LLMs.

  • Base techniques: Learn the different levels of precision (FP32, FP16, INT8, etc.) and how to perform naïve quantization with absmax and zero-point techniques.
  • GGUF & llama.cpp: Originally designed to run on CPUs, llama.cpp and the GGUF format have become the most popular tools to run LLMs on consumer-grade hardware. It supports storing special tokens, vocabulary, and metadata in a single file.
  • GPTQ & AWQ: Techniques like GPTQ/EXL2 and AWQ introduce layer-by-layer calibration that retains performance at extremely low bitwidths. They reduce catastrophic outliers using dynamic scaling, selectively skipping or re-centering the heaviest parameters.
  • SmoothQuant & ZeroQuant: New quantization-friendly transformations (SmoothQuant) and compiler-based optimizations (ZeroQuant) help mitigate outliers before quantization. They also reduce hardware overhead by fusing certain ops and optimizing dataflow.

📚 References:


8. New Trends

Here are notable topics that didn't fit into other categories. Some are established techniques (model merging, multimodal), but others are more experimental (interpretability, test-time compute scaling) and the focus of numerous research papers.

  • Model merging: Merging trained models has become a popular way of creating performant models without any fine-tuning. The popular mergekit library implements the most popular merging methods, like SLERP, DARE, and TIES.
  • Multimodal models: These models (like CLIP, Stable Diffusion, or LLaVA) process multiple types of inputs (text, images, audio, etc.) with a unified embedding space, which unlocks powerful applications like text-to-image.
  • Interpretability: Mechanistic interpretability techniques like Sparse Autoencoders (SAEs) have made remarkable progress to provide insights about the inner workings of LLMs. This has also been applied with techniques such as abliteration, which allow you to modify the behavior of models without training.
  • Test-time compute: Reasoning models trained with RL techniques can be further improved by scaling the compute budget during test time. It can involve multiple calls, MCTS, or specialized models like a Process Reward Model (PRM). Iterative steps with precise scoring significantly improve performance for complex reasoning tasks.

📚 References:

👷 The LLM Engineer

This section of the course focuses on learning how to build LLM-powered applications that can be used in production, with a focus on augmenting models and deploying them.

1. Running LLMs

Running LLMs can be difficult due to high hardware requirements. Depending on your use case, you might want to simply consume a model through an API (like GPT-4) or run it locally. In any case, additional prompting and guidance techniques can improve and constrain the output for your applications.

  • LLM APIs: APIs are a convenient way to deploy LLMs. This space is divided between private LLMs (OpenAI, Google, Anthropic, etc.) and open-source LLMs (OpenRouter, Hugging Face, Together AI, etc.).
  • Open-source LLMs: The Hugging Face Hub is a great place to find LLMs. You can directly run some of them in Hugging Face Spaces, or download and run them locally in apps like LM Studio or through the CLI with llama.cpp or ollama.
  • Prompt engineering: Common techniques include zero-shot prompting, few-shot prompting, chain of thought, and ReAct. They work better with bigger models, but can be adapted to smaller ones.
  • Structuring outputs: Many tasks require a structured output, like a strict template or a JSON format. Libraries like Outlines can be used to guide the generation and respect a given structure. Some APIs also support structured output generation natively using JSON schemas.

📚 References:


2. Building a Vector Storage

Creating a vector storage is the first step to building a Retrieval Augmented Generation (RAG) pipeline. Documents are loaded, split, and relevant chunks are used to produce vector representations (embeddings) that are stored for future use during inference.

  • Ingesting documents: Document loaders are convenient wrappers that can handle many formats: PDF, JSON, HTML, Markdown, etc. They can also directly retrieve data from some databases and APIs (GitHub, Reddit, Google Drive, etc.).
  • Splitting documents: Text splitters break down documents into smaller, semantically meaningful chunks. Instead of splitting text after n characters, it's often better to split by header or recursively, with some additional metadata.
  • Embedding models: Embedding models convert text into vector representations. Picking task-specific models significantly improves performance for semantic search and RAG.
  • Vector databases: Vector databases (like Chroma, Pinecone, Milvus, FAISS, Annoy, etc.) are designed to store embedding vectors. They enable efficient retrieval of data that is 'most similar' to a query based on vector similarity.

📚 References:


3. Retrieval Augmented Generation

With RAG, LLMs retrieve contextual documents from a database to improve the accuracy of their answers. RAG is a popular way of augmenting the model's knowledge without any fine-tuning.

  • Orchestrators: Orchestrators like LangChain and LlamaIndex are popular frameworks to connect your LLMs with tools and databases. The Model Context Protocol (MCP) introduces a new standard to pass data and context to models across providers.
  • Retrievers: Query rewriters and generative retrievers like CoRAG and HyDE enhance search by transforming user queries. Multi-vector and hybrid retrieval methods combine embeddings with keyword signals to improve recall and precision.
  • Memory: To remember previous instructions and answers, LLMs and chatbots like ChatGPT add this history to their context window. This buffer can be improved with summarization (e.g., using a smaller LLM), a vector store + RAG, etc.
  • Evaluation: We need to evaluate both the document retrieval (context precision and recall) and the generation stages (faithfulness and answer relevancy). It can be simplified with tools Ragas and DeepEval (assessing quality).

📚 References:


4. Advanced RAG

Real-life applications can require complex pipelines, including SQL or graph databases, as well as automatically selecting relevant tools and APIs. These advanced techniques can improve a baseline solution and provide additional features.

  • Query construction: Structured data stored in traditional databases requires a specific query language like SQL, Cypher, metadata, etc. We can directly translate the user instruction into a query to access the data with query construction.
  • Tools: Agents augment LLMs by automatically selecting the most relevant tools to provide an answer. These tools can be as simple as using Google or Wikipedia, or more complex, like a Python interpreter or Jira.
  • Post-processing: Final step that processes the inputs that are fed to the LLM. It enhances the relevance and diversity of documents retrieved with re-ranking, RAG-fusion, and classification.
  • Program LLMs: Frameworks like DSPy allow you to optimize prompts and weights based on automated evaluations in a programmatic way.

📚 References:


5. Agents

An LLM agent can autonomously perform tasks by taking actions based on reasoning about its environment, typically through the use of tools or functions to interact with external systems.

  • Agent fundamentals: Agents operate using thoughts (internal reasoning to decide what to do next), action (executing tasks, often by interacting with external tools), and observation (analyzing feedback or results to refine the next step).
  • Agent protocols: Model Context Protocol (MCP) is the industry standard for connecting agents to external tools and data sources with MCP servers and clients. More recently, Agent2Agent (A2A) tries to standardize a common language for agent interoperability.
  • Vendor frameworks: Each major cloud model provider has its own agentic framework with OpenAI SDK, Google ADK, and Claude Agent SDK if you're particularly tied to one vendor.
  • Other frameworks: Agent development can be streamlined using different frameworks like LangGraph (design and visualization of workflows) LlamaIndex (data-augmented agents with RAG), or custom solutions. More experimental frameworks include collaboration between different agents, such as CrewAI (role-based team workflows) and AutoGen (conversation-driven multi-agent systems).

📚 References:

  • Agents Course: Popular course about AI agents made by Hugging Face.
  • LangGraph: Overview of how to build AI agents with LangGraph.
  • LlamaIndex Agents: Uses cases and resources to build agents with LlamaIndex.

6. Inference optimization

Text generation is a costly process that requires expensive hardware. In addition to quantization, various techniques have been proposed to maximize throughput and reduce inference costs.

  • Flash Attention: Optimization of the attention mechanism to transform its complexity from quadratic to linear, speeding up both training and inference.
  • Key-value cache: Understand the key-value cache and the improvements introduced in Multi-Query Attention (MQA) and Grouped-Query Attention (GQA).
  • Speculative decoding: Use a small model to produce drafts that are then reviewed by a larger model to speed up text generation. EAGLE-3 is a particularly popular solution.

📚 References:

  • GPU Inference by Hugging Face: Explain how to optimize inference on GPUs.
  • LLM Inference by Databricks: Best practices for how to optimize LLM inference in production.
  • Optimizing LLMs for Speed and Memory by Hugging Face: Explain three main techniques to optimize speed and memory, namely quantization, Flash Attention, and architectural innovations.
  • Assisted Generation by Hugging Face: HF's version of speculative decoding. It's an interesting blog post about how it works with code to implement it.
  • EAGLE-3 paper: Introduces EAGLE-3 and reports speedups up to 6.5×.
  • Speculators: Library made by vLLM for building, evaluating, and storing speculative decoding algorithms (e.g., EAGLE-3) for LLM inference.

7. Deploying LLMs

Deploying LLMs at scale is an engineering feat that can require multiple clusters of GPUs. In other scenarios, demos and local apps can be achieved with much lower complexity.

  • Local deployment: Privacy is an important advantage that open-source LLMs have over private ones. Local LLM servers (LM Studio, Ollama, oobabooga, kobold.cpp, etc.) capitalize on this advantage to power local apps.
  • Demo deployment: Frameworks like Gradio and Streamlit are helpful to prototype applications and share demos. You can also easily host them online, for example, using Hugging Face Spaces.
  • Server deployment: Deploying LLMs at scale requires cloud (see also SkyPilot) or on-prem infrastructure and often leverages optimized text generation frameworks like TGI, vLLM, etc.
  • Edge deployment: In constrained environments, high-performance frameworks like MLC LLM and mnn-llm can deploy LLM in web browsers, Android, and iOS.

📚 References:


8. Securing LLMs

In addition to traditional security problems associated with software, LLMs have unique weaknesses due to the way they are trained and prompted.

  • Prompt hacking: Different techniques related to prompt engineering, including prompt injection (additional instruction to hijack the model's answer), data/prompt leaking (retrieve its original data/prompt), and jailbreaking (craft prompts to bypass safety features).
  • Backdoors: Attack vectors can target the training data itself, by poisoning the training data (e.g., with false information) or creating backdoors (secret triggers to change the model's behavior during inference).
  • Defensive measures: The best way to protect your LLM applications is to test them against these vulnerabilities (e.g., using red teaming and checks like garak) and observe them in production (with a framework like langfuse).

📚 References:


Acknowledgements

This roadmap was inspired by the excellent DevOps Roadmap from Milan Milanović and Romano Roth.

Special thanks to:

  • Thomas Thelen for motivating me to create a roadmap
  • André Frade for his input and review of the first draft
  • Dino Dunn for providing resources about LLM security
  • Magdalena Kuhn for improving the "human evaluation" part
  • Odoverdose for suggesting 3Blue1Brown's video about Transformers
  • Everyone who contributed to the educational references in this course :)

from  https://github.com/mlabonne/llm-course

SaaS-Starter-Stack

 

Free and Affordable Tools for Building a SaaS


A curated list of free and affordable tools for building a SaaS.

Get your SaaS up and running in no time with this list of free and affordable tools. Contribute.

Table of Contents

Guide

This guide is aimed at guiding you through the whole journey of building your own startup based on my learnings of growing my own startup Pallyy.

Interviews

Learn even more from reading interviews from SaaS founders with at least $500 MRR. Coming soon. To submit an interview, read the guidelines.

  • Talknotes - AI note taking app by Nico Jeannen doing $3.5K MRR.
  • Gliglish - Learn languages with AI by Fabien Snauwaert doing $8K MRR.
  • Plausible - Website analytics by Marko Sarik and Uku doing over $100K MRR.
  • PDFai - Chat with PDF tool by Damon Chen doing over $50K MRR.
  • Publer - Social scheduling platform by Ervin Kalemi doing $170K MRR.
  • Simple Analytics - Privacy-friendly analytics doing $30K MRR.

Tools

Code

  • Astro - The web framework for content-driven websites.
  • Nuxt - The intuitive Vue framework.
  • Next.js - The React Framework for the Web
  • Remix - Focused on web standards and modern web app UX
  • Sveltekit - Web development, streamlined.

Boilerplate Starter Kits

  • BoxyHQ - Enterprise ready, open source, and powered by SAML Jackson.
  • Just Launch It - Sveltekit boilerplate.
  • LaraFast - Laravel boilerplate with ready-to-go components.
  • LaunchFast - Astro, Next.js, and SvelteKit boilerplates for.
  • Nextless.js - Next.js Boilerplate with Auth, Multi-tenancy & Team, etc.
  • SaaS Pegasus - The premier SaaS boilerplate for Python and Django.
  • ShipFast - NextJS boilerplate.
  • Shipped.club - NextJS Startup Boilerplate with Chrome Extension.
  • Ionstarter - Ionic starter templates to launch apps.
  • RapidLaunch - Nuxt.js boilerplate.
  • Supastarter - Production-ready SaaS starter kit for Next.js 14 and Nuxt 3.
  • React Native Boilerplate - Mobile SaaS Boilerplate to launch on iOS and Android.
  • DevToDollars - Open-source Flutter boilerplate.
  • Shipixen - Next.js boilerplates with an MDX blog, TypeScript and Shadcn UI

Databases

  • Appwrite - Open-source backend-as-a-service platform for databases.
  • MongoDB - Developer data platform (NoSQL).
  • Pocketbase - Open Source backend in 1 file.
  • Prisma - Simple db interactions via the ORM, + connection pooling & edge caching, + type-safe db events
  • Supabase - Open Source Firebase Alternative

Hosting

  • Render - Build, deploy, and scale your apps.
  • Vercel - Build, scale, and secure a faster, personalized web.
  • Railway - Instant Deployments, Effortless Scale
  • Netlify - Connect everything. Build anything.
  • Zeabur - Deploy painlessly and scale infinitely.
  • Hetzner - Low-cost dedicated server for self-hosting.
  • Coolify - Open-source Vercel and Netlify alternative.

Subscriptions and Payments

  • Stripe - Financial infrastructure for the internet.
  • Lemon Squeezy - Payments, tax & subscriptions.
  • Paddle - The complete payments, tax, and subscriptions solution.

Knowledge Base and Help Center

  • HelpKit - Turn Notion into a Help Center / Documentation Site.
  • Bliberoo - Help center, internal wiki or API documetation.

Live Chat

  • Crisp - All-in-one business messaging platform.
  • Intercom - AI customer service solution.
  • JustReply - Customer support tool for teams using Slack.
  • Tawk - Free Live Chat, Ticketing, Knowledge Base & CRM.

Chatbots

  • Mevo - Chatbot builder with AI and rule-based options.

Social Media Management

  • Pallyy - Scheduling platform for brands and agencies.
  • Buffer- Grow your audience on social and beyond.
  • StoryChief - Content Marketing Platform for marketing teams.

Blogging

  • Blogkit - Blogging starter kits for Next.js with WordPress, Directus, Contentlayer & MDX.
  • BlogPro - Notion to Blog for startups.
  • Docs to Markdown Pro - Publish Google Docs as Markdown to GitHub/GitLab.
  • Docs to WP Pro - Publish SEO-optimized WordPress posts from Google Docs.
  • Quotion - Apple Notes to Blog in minutes, built-in web analytics, SEO-ready.

Link Shortening

  • Dub - Open-source link management.
  • URLR - Reliable and GDPR-compliant link shortener.

Media Processing and Content Delivery Networks

  • Transloadit - Receive, transform, or deliver any file.
  • ImageKit - Real-time image and video optimizations, transformations.

Website Analytics

  • Beam - Google Analytics alternative.
  • Fathom - Excellent Google Analytics Alternative
  • Penkle - EU Based privacy focused analytics.
  • Pirsch - Cookie-free and Privacy-friendly Web Analytics.
  • Plausible - Privacy first analytics.
  • Simple - EU Based compliancy focused.
  • Umami - Empowering insights, Preserving privacy.
  • Usermaven - Free, privacy-friendly website analytics and product insights.

Website Monitoring

  • DataDog - See inside any stack, any app, at any scale, anywhere.
  • OpenStatus - The open-source website & API monitoring platform.
  • Sentry - Fully integrated, multi-environment performance monitoring & error tracking.

User Feedback

  • Canny - Capture product feedback.
  • featureOS - Organize product feedback and analyze with AI.
  • Supahub - Collect feedback & announce product updates.

SMS Notifications

  • Notilify - Send marketing, transactional, notifications, and more.
  • Twilio - Industry leading customer management platform.

Push Notifications

Affiliates

  • PromoteKit - Affiliate software for Stripe.
  • Rewardful - Set up affiliate and customer referral programs for Stripe.
  • Tolt - Affiliate software for Paddle, Stripe and Chargebee.

Email Notifications

  • Resend - Email for developers.
  • Mailgun - Email service providing API, SMTP.
  • Plunk - The Email Platform for SaaS.
  • Postmark - Developer friendly Email Delivery Service.

Event Scheduling

  • Cal.com - Scheduling Infrastructure for Everyone.

Authentification and User Management

  • BoxyHQ - Open source security building blocks for developers.
  • Clerk - The most comprehensive User Management Platform

CRM

  • Wobaka - Refreshingly simple CRM and email automation

Form Builders

  • Tally - The simplest free online form builder.
  • Youform - Create waitlist forms, surveys and more for free.

Website Builders

  • Versoly - The fastest way to build your pixel perfect website for free.

from  https://github.com/timb-103/saas-starter-stack/