GitHub Trends
10.1K subscribers
15.3K links
See what the GitHub community is most excited about today.

A bot automatically fetches new repositories from https://github.com/trending and sends them to the channel.

Author and maintainer: https://github.com/katursis
Download Telegram
#javascript #batch_processing #batch_script #code_free #crawler #data_collection #frontend #gui #html #input_parameters #layman #parameters #robotics #rpa #scraper #spider #visual #visualization #visualprogramming #web #www

EasySpider is a free, code-free web crawler software that helps you collect data from websites easily. You can use it without writing any code, just by selecting the content you want to operate on a web page and following the prompts. It can also be run from the command line, making it easy to integrate into other systems. This software is very user-friendly and allows you to collect data from websites quickly, even if you don't know how to code. It also supports various proxy services and captcha solutions, making data collection more efficient. Using EasySpider can save you a lot of time and effort in collecting web data.

https://github.com/NaiboWang/EasySpider
#go #batch_systems #bigdata #gene #golang #hpc #kubernetes #machine_learning

Volcano is a powerful batch system built on Kubernetes, designed to manage complex workloads like machine learning, bioinformatics, and big data applications. It integrates with popular frameworks such as TensorFlow, Spark, and PyTorch. Volcano benefits users by providing efficient scheduling and management of high-performance workloads, leveraging over 15 years of experience and best practices from the open source community. It is widely used in various industries and has a strong community support with hundreds of contributors. Installing Volcano is straightforward, either through YAML files or Helm charts, making it easy to get started and manage your batch workloads effectively.

https://github.com/volcano-sh/volcano
#java #batch #cdc #change_data_capture #data_integration #data_pipeline #distributed #elt #etl #flink #kafka #mysql #paimon #postgresql #real_time #schema_evolution

Flink CDC is a tool that helps you move and transform data in real-time or in batches. It makes data integration simple by using YAML files to describe how data should be moved and transformed. This tool offers features like full database synchronization, table sharding, schema evolution, and data transformation. To use it, you need to set up an Apache Flink cluster, download Flink CDC, create a YAML file to define your data sources and sinks, and then run the job. This benefits you by making it easier to manage and integrate your data efficiently across different databases.

https://github.com/apache/flink-cdc
#java #apache #batch #cdc #change_data_capture #data_ingestion #data_integration #elt #high_performance #offline #real_time #streaming

Apache SeaTunnel is a powerful tool for integrating and synchronizing large amounts of data from various sources. It supports over 100 connectors, allowing you to connect to many different data sources. SeaTunnel is efficient, stable, and resource-friendly, minimizing the use of computing resources and JDBC connections. It also provides real-time monitoring and ensures data quality to prevent loss or duplication. You can use it with different execution engines like Flink, Spark, and SeaTunnel Zeta Engine. This tool is beneficial because it simplifies complex data synchronization tasks, offers high throughput with low latency, and provides detailed insights during the process. Additionally, it has a user-friendly web project for visual job management, making it easier to manage your data integration tasks.

https://github.com/apache/seatunnel