GitHub Trends
10.1K subscribers
15.3K links
See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis
Download Telegram
#typescript #analytics #apache #apache_superset #asf #bi #business_analytics #business_intelligence #data_analysis #data_analytics #data_engineering #data_science #data_visualization #data_viz #flask #python #react #sql_editor #superset

Superset is a powerful business intelligence tool that helps you explore and visualize data easily. It offers a no-code interface for building charts, a robust SQL Editor for advanced queries, and support for nearly any SQL database or data engine. You can create beautiful visualizations, define custom dimensions and metrics quickly, and use a lightweight caching layer to reduce database load. Superset also provides extensible security roles and authentication options, an API for customization, and a cloud-native architecture designed for scale. This makes it easier to analyze and present your data in a user-friendly way, replacing or augmenting proprietary BI tools effectively.

https://github.com/apache/superset
🔥1
#python #data_analysis #data_science #data_visualization #deep_learning #deploy #gradio #gradio_interface #hacktoberfest #interface #machine_learning #models #python #python_notebook #ui #ui_components

Gradio is a Python package that helps you quickly build and share web demos for your machine learning models or any Python function. You don't need to know JavaScript, CSS, or web hosting to use it. With just a few lines of Python code, you can create a demo and share it via a public link. Gradio offers various tools like the `Interface` class for simple demos, `ChatInterface` for chatbots, and `Blocks` for more complex custom applications. It also allows easy sharing of your demos with others by generating a public URL in seconds. This makes it easy to showcase your work without technical hassle.

https://github.com/gradio-app/gradio
#jupyter_notebook #data_analysis #data_science #data_visualization #pandas #python

This curriculum is designed to help beginners learn data science over 10 weeks with 20 detailed lessons. Each lesson includes pre- and post-lesson quizzes, step-by-step guides, knowledge checks, and assignments to ensure you retain the information. You'll learn about data ethics, statistics, working with different types of data, data visualization, and the entire data science lifecycle. The project-based approach helps you build practical skills while learning. Additionally, there are resources for students and teachers to make the learning process flexible and engaging. This curriculum is beneficial because it provides a structured and interactive way to gain hands-on experience in data science, making it easier to understand and apply these skills in real-world scenarios.

https://github.com/microsoft/Data-Science-For-Beginners
👍1
#python #analytics #dagster #data_engineering #data_integration #data_orchestrator #data_pipelines #data_science #etl #metadata #mlops #orchestration #python #scheduler #workflow #workflow_automation

Dagster is a tool that helps you manage and automate your data workflows. You can define your data assets, like tables or machine learning models, using Python functions. Dagster then runs these functions at the right time and keeps your data up-to-date. It offers features like integrated lineage and observability, making it easier to track and manage your data. This tool is useful for every stage of data development, from local testing to production, and it integrates well with other popular data tools. Using Dagster, you can build reusable components, spot data quality issues early, and scale your data pipelines efficiently. This makes your work more productive and helps maintain control over complex data systems.

https://github.com/dagster-io/dagster
👍1
#jupyter_notebook #aws #data_science #deep_learning #examples #inference #jupyter_notebook #machine_learning #mlops #reinforcement_learning #sagemaker #training

SageMaker-Core is a new Python SDK for Amazon SageMaker that makes it easier to work with machine learning resources. It provides an object-oriented interface, which means you can manage resources like training jobs, models, and endpoints more intuitively. The SDK simplifies code by allowing resource chaining, eliminating the need to manually specify parameters. It also includes features like auto code completion, comprehensive documentation, and type hints, making it faster and less error-prone to write code. This helps developers customize their ML workloads more efficiently and streamline their development process.

https://github.com/aws/amazon-sagemaker-examples
#python #airflow #apache #apache_airflow #automation #dag #data_engineering #data_integration #data_orchestrator #data_pipelines #data_science #elt #etl #machine_learning #mlops #orchestration #python #scheduler #workflow #workflow_engine #workflow_orchestration

Apache Airflow is a tool that helps you manage and automate workflows. You can write your workflows as code, making them easier to maintain, version, test, and collaborate on. Airflow lets you schedule tasks and monitor their progress through a user-friendly interface. It supports dynamic pipeline generation, is highly extensible, and scalable, allowing you to define your own operators and executors.

Using Airflow benefits you by making your workflows more organized, efficient, and reliable. It simplifies the process of managing complex tasks and provides clear visualizations of your workflow's performance, helping you identify and troubleshoot issues quickly. This makes it easier to manage data processing and other automated tasks effectively.

https://github.com/apache/airflow
👍1
#python #autogluon #automated_machine_learning #automl #computer_vision #data_science #deep_learning #ensemble_learning #forecasting #gluon #hyperparameter_optimization #machine_learning #natural_language_processing #object_detection #python #pytorch #scikit_learn #structured_data #tabular_data #time_series #transfer_learning

AutoGluon makes machine learning easy and fast. With just a few lines of code, you can train and use high-accuracy models for images, text, time series, and tabular data. This means you can quickly build and deploy powerful machine learning models without needing to write a lot of code. It supports Python 3.8 to 3.11 and works on Linux, MacOS, and Windows, making it convenient for various users. This saves time and effort, allowing you to focus on other parts of your project.

https://github.com/autogluon/autogluon
#python #artificial_intelligence #dag #data_science #data_visualization #dataflow #developer_tools #machine_learning #notebooks #pipeline #python #reactive #web_app

Marimo is a powerful tool for Python users that makes working with notebooks much easier and more efficient. Here’s what it offers When you run a cell or interact with UI elements, marimo automatically updates dependent cells, keeping your code and outputs consistent.
- **Interactive** Marimo ensures no hidden state and deterministic execution, making your work reliable.
- **Executable** Notebooks are stored as `.py` files, making version control easy.
- **Modern Editor**: It includes features like GitHub Copilot, AI assistants, and more quality-of-life tools.

Using marimo helps you avoid errors, keeps your code organized, and makes sharing and deploying your work simpler.

https://github.com/marimo-team/marimo
#python #automation #data #data_engineering #data_ops #data_science #infrastructure #ml_ops #observability #orchestration #pipeline #prefect #python #workflow #workflow_engine

Prefect is a tool that helps you automate and manage data workflows in Python. It makes it easy to turn your scripts into reliable and flexible workflows that can handle unexpected changes. With Prefect, you can schedule tasks, retry failed operations, and monitor your workflows. You can install it using `pip install -U prefect` and start creating workflows with just a few lines of code. This helps data teams work more efficiently, reduce errors, and save time. You can also use Prefect Cloud for more advanced features and support.

https://github.com/PrefectHQ/prefect
#other #ai #data_science #devops #engineering #federated_learning #machine_learning #ml #mlops #software_engineering

This resource is a comprehensive guide to Machine Learning Operations (MLOps), providing a wide range of tools, articles, courses, and communities to help you manage and deploy machine learning models effectively.

**Key Benefits** Access to numerous books, articles, courses, and talks on MLOps, machine learning, and data science.
- **Community Support** Detailed guides on workflow management, feature stores, model deployment, testing, monitoring, and maintenance.
- **Infrastructure Tools** Resources on model governance, ethics, and responsible AI practices.

Using these resources, you can improve your skills in designing, training, and running machine learning models efficiently, ensuring they are reliable, scalable, and maintainable in production environments.

https://github.com/visenger/awesome-mlops
👎1
#python #cleandata #data_engineering #data_profilers #data_profiling #data_quality #data_science #data_unit_tests #datacleaner #datacleaning #dataquality #dataunittest #eda #exploratory_analysis #exploratory_data_analysis #exploratorydataanalysis #mlops #pipeline #pipeline_debt #pipeline_testing #pipeline_tests

GX Core is a powerful tool for ensuring data quality. It allows you to write simple tests, called "Expectations," to check if your data meets certain standards. This helps teams work together more effectively and keeps everyone informed about the data's quality. You can automatically generate reports, making it easy to share results and preserve your organization's knowledge about its data. To get started, you just need to install GX Core in a Python virtual environment and follow some simple steps. This makes managing data quality much simpler and more efficient.

https://github.com/great-expectations/great_expectations
#python #ai #csv #data #data_analysis #data_science #data_visualization #database #datalake #gpt_4 #llm #pandas #sql #text_to_sql

PandaAI is a tool that lets you ask questions about your data using natural language. It's helpful for both non-technical and technical users. Non-technical users can interact with data more easily, while technical users can save time and effort. You can load your data, save it as a dataframe, and then ask questions like "Which are the top 5 countries by sales?" or "What is the total sales for the top 3 countries?" PandaAI also allows you to visualize charts and work with multiple datasets. It's easy to install using pip or poetry and can be used in Jupyter notebooks, Streamlit apps, or even a secure Docker sandbox. This makes it simpler and more efficient to analyze your data.

https://github.com/sinaptik-ai/pandas-ai
#python #ai #artificial_intelligence #cython #data_science #deep_learning #entity_linking #machine_learning #named_entity_recognition #natural_language_processing #neural_network #neural_networks #nlp #nlp_library #python #spacy #text_classification #tokenization

spaCy is a powerful tool for understanding and processing human language. It helps computers analyze text by breaking it into parts like words, sentences, and entities (like names or places). This makes it useful for tasks such as identifying who is doing what in a sentence or finding specific information from large texts. Using spaCy can save time and improve accuracy compared to manual analysis. It supports many languages and integrates well with advanced models like BERT, making it ideal for real-world applications.

https://github.com/explosion/spaCy
#python #agent #ai #automation #data_mining #data_science #development #llm #research

RD-Agent is a tool that helps automate research and development (R&D) tasks. It can read reports, propose new ideas, and implement them using data. This tool acts like a copilot for researchers, automating repetitive tasks or working independently to suggest better solutions. RD-Agent supports various scenarios, such as finance and medical fields, making it easier to streamline model development and data analysis. By using RD-Agent, users can save time and boost productivity in their R&D work.

https://github.com/microsoft/RD-Agent
#other #automl #chatgpt #data_analysis #data_science #data_visualization #data_visualizations #deep_learning #gpt #gpt_3 #jax #keras #machine_learning #ml #nlp #python #pytorch #scikit_learn #tensorflow #transformer

This is a comprehensive, regularly updated list of 920 top open-source Python machine learning libraries, organized into 34 categories like frameworks, data visualization, NLP, image processing, and more. Each project is ranked by quality using GitHub and package manager metrics, helping you find the best tools for your needs. Popular libraries like TensorFlow, PyTorch, scikit-learn, and Hugging Face transformers are included, along with specialized ones for time series, reinforcement learning, and model interpretability. This resource saves you time by guiding you to high-quality, actively maintained libraries for building, optimizing, and deploying machine learning models efficiently.

https://github.com/ml-tooling/best-of-ml-python
#python #data_mining #data_science #deep_learning #deep_reinforcement_learning #genetic_algorithm #machine_learning #machine_learning_from_scratch

This project offers Python code for many basic machine learning models and algorithms built from scratch, focusing on clear, understandable implementations rather than speed or optimization. You can learn how these algorithms work inside by running examples like polynomial regression, convolutional neural networks, clustering, and genetic algorithms. This hands-on approach helps you deeply understand machine learning concepts and build your own custom models. Using Python makes it easier because of its simple, readable code and flexibility, letting you quickly test and modify algorithms. This can improve your skills and confidence in machine learning development.

https://github.com/eriklindernoren/ML-From-Scratch
#html #data_science #education #machine_learning #machine_learning_algorithms #machinelearning #machinelearning_python #microsoft_for_beginners #ml #python #r #scikit_learn #scikit_learn_python

Microsoft’s "Machine Learning for Beginners" is a free, 12-week course with 26 lessons designed to teach classic machine learning using Python and Scikit-learn. It includes quizzes, projects, and assignments to help you learn by doing, with lessons themed around global cultures to keep it engaging. You can access solutions, videos, and even R language versions. The course is beginner-friendly, flexible, and helps build practical skills step-by-step, making it easier to understand and apply machine learning concepts in real-world scenarios. This structured approach boosts your learning retention and prepares you for further study or career growth in ML[1][5].

https://github.com/microsoft/ML-For-Beginners
#other #artificial_intelligence #artificial_intelligence_projects #awesome #computer_vision #computer_vision_project #data_science #deep_learning #deep_learning_project #machine_learning #machine_learning_projects #nlp #nlp_projects #python

You can access a huge, constantly updated list of over 500 artificial intelligence projects with ready-to-use code covering machine learning, deep learning, computer vision, and natural language processing. This collection includes projects for beginners and advanced users, with links to tutorials, datasets, and real-world applications like chatbots, healthcare, and time series forecasting. Using this resource helps you learn AI by doing practical projects, speeding up your coding skills, and building a strong portfolio for jobs or research. It saves you time searching for quality projects and gives you tested, working code to study and modify.

https://github.com/ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code