Machine learning
Models, frameworks, training infra
100 repositories
Learn it. Build it. Ship it for others.
FinceptTerminal is a modern finance application offering advanced market analytics, investment research, and economic data tools, designed for interactive exploration and data-driven decision-making in a user-friendly environment.
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
We write your reusable computer vision tools. 💜
A high-throughput and memory-efficient inference and serving engine for LLMs
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works with Claude Code, Codex, OpenClaw, or any LLM agent.
High-performance AI pipeline engine with a C++ core and 50+ Python-extensible nodes. Build, debug, and scale LLM workflows with 13+ model providers, 8+ vector databases, and agent orchestration, all from your IDE. Includes VS Code extension, TypeScript/Python SDKs, and Docker deployment.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning
程序员鱼皮的 AI 资源大全 + Vibe Coding 零基础教程,分享 OpenClaw 保姆级教程、大模型玩法(DeepSeek / GPT / Gemini / Claude)、最新 AI 资讯、Prompt 提示词大全、AI 知识百科(Agent Skills / RAG / MCP / A2A)、AI 编程教程(Harness Engineering)、AI 工具用法(Cursor / Claude Code / TRAE / Codex / Copilot)、AI 开发框架教程(Spring AI / LangChain)、AI 产品变现指南,帮你快速掌握 AI 技术,走在时代前沿。本项目为开源文档,已升级为鱼皮 AI 导航网站
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/multiplatform CPU, AMD, NVIDIA GPU PyTorch support, handling, and auto-stitching
Financial data platform for analysts, quants and AI agents.
Ultralytics YOLO 🚀
Production-grade Rust-native trading engine with deterministic event-driven architecture
A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
A fully open-source humanoid arm for physical AI research and deployment in contact-rich environments.
f.k.a. Awesome ChatGPT Prompts. Share, discover, and collect prompts from the community. Free and open source — self-host for your organization with complete privacy.
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Open-source deep-learning framework for exploring, building and deploying AI weather/climate workflows.
The official Python client for the Hugging Face Hub.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Drop-in Apache Spark replacement written in Rust, unifying batch processing, stream processing, and compute-intensive AI workloads.
List of Computer Science courses with video lectures.
Image annotation with Python. Supports polygon, rectangle, circle, line, point, and AI-assisted annotation.
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
YC (S26) | Give AI the ability to live your experience. Records everything you do, say, hear 24/7, local, private, secure
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.
ML-powered manga translator, written in Rust.
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).
Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
A hyperparameter optimization framework
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
A framework for efficient model inference with omni-modality models
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while controlling costs and managing access to models and data.
Improve your resumes with Resume Matcher. Get insights, keyword suggestions and tune your resumes to job descriptions.
A community-supported supercharged document management system: scan, index and archive all your documents
Apache Superset is a Data Visualization and Data Exploration Platform
Tesseract Open Source OCR Engine (main repository)
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
scikit-learn: machine learning in Python
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
A toolkit to run Ray applications on Kubernetes
The fastest path to AI-powered full stack observability, even for lean teams.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
XR Blocks is a lightweight WebXR + AI library for rapidly prototyping advanced AI + XR experiences.
Streamlit — A faster way to build and share data apps.
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
SwarmUI (formerly StableSwarmUI), A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility.
A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.
Open Source Feature Flags, Experimentation, and Product Analytics
Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
Cross-platform, customizable ML solutions for live and streaming media.
C* (C-Asterisk) is a custom, high-performance programming language. It uses LLVM and native I/O bypasses to run Deep Learning models from scratch 100x faster than Python.
RDNA-native LLM inference engine in Rust.
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Local-first control plane for data scripts, SQL, containers, SSH commands, and AI agent workflows. One binary, file-backed state, no external database or broker. Use your favorite AI agents to create, update, and run your workflows via secure MCP with traceability and logging.
Lightning ⚡️ fast forecasting with statistical and econometric models.
Refine high-quality datasets and visual AI models
An Open Source Machine Learning Framework for Everyone
Visualizer for neural network, deep learning and machine learning models
Tensor library for machine learning
Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
NVR with realtime local object detection for IP cameras
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
ncnn is a high-performance neural network inference framework optimized for the mobile platform
The nnsight package enables interpreting and manipulating the internals of deep learned models.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
Open Source Continuous Inference Benchmarking Qwen3.5, DeepSeek, GPTOSS - GB200 NVL72 vs MI355X vs B200 vs GB300 NVL72 vs H100 & soon™ TPUv6e/v7/Trainium2/3
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / verl / LLaMA Factory / ms-swift / Ultralytics / MMEngine / Keras etc.
Time-Series Work Summary in CS Top Conferences (NIPS, ICML, ICLR, KDD, AAAI, WWW, IJCAI, CIKM, ICDM, ICDE, etc.)
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.
Open Machine Learning Compiler Framework
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
TorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
Fit interpretable models. Explain blackbox machine learning.
matplotlib: plotting with Python
Memory layer for AI Agents. Replace complex RAG pipelines with a serverless, single-file memory layer. Give your agents instant retrieval and long-term memory.
Machine Learning Systems
🕸️ Web apps in pure Python 🐍
Computer Vision Annotation Tool (CVAT) is a leading platform for building high-quality visual datasets for vision AI. It offers open-source, cloud, and enterprise products, as well as labeling services, for image, video, and 3D annotation with AI-assisted labeling, quality assurance, team collaboration, analytics, and developer APIs.
An orchestration platform for the development, production, and observation of data assets.
A portable accelerated SQL query, search, and LLM-inference engine, written in Rust, for data-grounded AI apps and agents.
Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.
Machine Learning Engineering Open Book
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Open standard for machine learning interoperability
The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
LabPlot is a FREE, open source and cross-platform Data Visualization and Analysis software accessible to everyone.
Curated list of the best truly open-source AI projects, models, tools, and infrastructure.
🤖 Automatically collected AI repos, tools, websites, papers & tutorials. 实用AI百宝箱 💎