Top #data Tools & Software
Explore 64 hand-picked tools and software tagged with data — ranked by popularity and community signals.
public-apis
githubA collective list of free APIs
openclaw
githubYour own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
n8n
githubFair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
netdata
githubThe fastest path to AI-powered full stack observability, even for lean teams.
awesome-scalability
githubThe Patterns of Scalable, Reliable, and Performant Large-Scale Systems
scikit-learn
githubscikit-learn: machine learning in Python
pathway
githubPython ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
30-Days-Of-Python
githubThe 30 Days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than 100 days. Follow your own pace. These videos may help too: https://www.youtube.com/channel/UC7PNRuno1rzYPb1xLa4yktw
MinerU
githubTransforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
memos
githubOpen-source, self-hosted note-taking tool built for quick capture. Markdown-native, lightweight, and fully yours.
TrendRadar
github⭐AI-driven public opinion & trend monitor with multi-platform aggregation, RSS, and smart alerts.🎯 告别信息过载,你的 AI 舆情监控助手与热点筛选工具!聚合多平台热点 + RSS 订阅,支持关键词精准筛选。AI 智能筛选新闻 + AI 翻译 + AI 分析简报直推手机,也支持接入 MCP 架构,赋能 AI 自然语言对话分析、情感洞察与趋势预测等。支持 Docker ,数据本地/云端自持。集成微信/飞书/钉钉/Telegram/邮件/ntfy/bark/slack 等渠道智能推送。
etcd
githubDistributed reliable key-value store for the most critical data of a distributed system
llama_index
githubLlamaIndex is the leading document agent and OCR platform
pandas
githubFlexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
airflow
githubApache Airflow - A platform to programmatically author, schedule, and monitor workflows
streamlit
githubStreamlit — A faster way to build and share data apps.
gradio
githubBuild and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
DeepSpeed
githubDeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
ColossalAI
githubMaking large AI models cheaper, faster and more accessible
BettaFish
github微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
tidb
githubTiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
quivr
githubOpiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.
mindsdb
githubQuery Engine for AI Analytics: Build self-reasoning agents across all your live data
Scrapling
github🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!