Top #data Tools & Software
Explore 64 hand-picked tools and software tagged with data โ ranked by popularity and community signals.
Retrieval-based-Voice-Conversion-WebUI
githubEasily train a good VC model with voice data <= 10 mins!
spaCy
github๐ซ Industrial-strength Natural Language Processing (NLP) in Python
dokploy
githubOpen Source Alternative to Vercel, Netlify and Heroku.
posthog
github๐ฆ PostHog is an all-in-one developer platform for building successful products. We offer product analytics, web analytics, session replay, error tracking, feature flags, experimentation, surveys, data warehouse, a CDP, and an AI product assistant to help debug your code, ship features faster, and keep all your usage and customer data in one stack.
cockroach
githubCockroachDB โ the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
interactive-coding-challenges
github120+ interactive Python coding interview challenges (algorithms and data structures). Includes Anki flashcards.
ML-From-Scratch
githubMachine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
pytorch-lightning
githubPretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
EasyOCR
githubReady-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
data-science-ipython-notebooks
githubData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
d2l-en
githubInteractive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
redash
githubMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
kestra
githubEvent Driven Orchestration & Scheduling Platform for Mission Critical Applications
gitleaks
githubFind secrets with Gitleaks ๐
prefect
githubPrefect is a workflow orchestration framework for building resilient data pipelines in Python.
taipy
githubTurns Data and AI algorithms into production-ready web applications in no time.
maxun
github๐ฅ The open-source no-code platform for web scraping, crawling, search and AI data extraction โข Turn websites into structured APIs in minutes ๐ฅ
awesome-mlops
githubA curated list of references for MLOps
RD-Agent
githubResearch and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through R&D-Agent, which lets AI drive data-driven AI. ๐https://aka.ms/RD-Agent-Tech-Report
encore
githubOpen source framework for building robust type-safe distributed systems with declarative infrastructure
test-your-sysadmin-skills
githubA collection of Linux Sysadmin Test Questions and Answers. Test your knowledge and skills in different fields with these Q/A.
pipedream
githubConnect APIs, remarkably fast. Free for developers.
tpot
githubA Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
miller
githubMiller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON