You can only know what makes your program slow after first getting the program to give correct results, then running it to see if the correct program is slow. When found to be slow, profiling can show what parts of the program are consuming most of the time. A comprehensive but quick-to-run test suite can then ensure that future optimizations don't change the correctness of your program. In short:
1. Get it right.
2. Test it's right.
3. Profile if slow.
4. Optimise.
5. Repeat from 2.
Source
1. Get it right.
2. Test it's right.
3. Profile if slow.
4. Optimise.
5. Repeat from 2.
Source
๐31๐ค1
IMG_20241118_172629_090.jpg
10.8 KB
Story of My Recent Days
I was working with a very large csv data, and i want to merge 4 very large csv files based on one col and pandas wasn't able to handle it so i decided to change my approach and process the files separately.
The thing is there is 2 tasks that have to be done on it
1. Process it and add to DB based on all the files [CPU Bound]
2. Download file and upload it to S3 and update the column with the S3 link [IO Bound]
So the first task is really fast since it all depends on the CPU i kinda get a good speed optimization already but the second task is taking more than one day to finish. Here is the bummer the task have to run every day ๐ and it is taking more than a day to complete the task.
But i come up with the solution to use multiple machine and separate out the task to handle the IO bound tasks like downloading and uploading file.
When i say downloading file i am talking about millions of files don't ask me why the bottom line is i have to download it and upload it to S3.
Anyways I just separate out processing of the files to multiple files and i am using asyncio to its peak and not to get blocked by the websites too.
Now it is gonna cut down to half the time to process the files and i am happy with it.
Moral of the story is if you are dealing with IO Bound Task may be try multiple machine to handle it.
I have got couple of more stories to share but too lazy to write it down ๐.
I was working with a very large csv data, and i want to merge 4 very large csv files based on one col and pandas wasn't able to handle it so i decided to change my approach and process the files separately.
The thing is there is 2 tasks that have to be done on it
1. Process it and add to DB based on all the files [CPU Bound]
2. Download file and upload it to S3 and update the column with the S3 link [IO Bound]
So the first task is really fast since it all depends on the CPU i kinda get a good speed optimization already but the second task is taking more than one day to finish. Here is the bummer the task have to run every day ๐ and it is taking more than a day to complete the task.
But i come up with the solution to use multiple machine and separate out the task to handle the IO bound tasks like downloading and uploading file.
When i say downloading file i am talking about millions of files don't ask me why the bottom line is i have to download it and upload it to S3.
Anyways I just separate out processing of the files to multiple files and i am using asyncio to its peak and not to get blocked by the websites too.
Now it is gonna cut down to half the time to process the files and i am happy with it.
Moral of the story is if you are dealing with IO Bound Task may be try multiple machine to handle it.
I have got couple of more stories to share but too lazy to write it down ๐.
1๐9โค1๐ฅ1
Forwarded from Pavel Durov (Paul Du Rove)
Tiny Verse opens in full-screen, which looks great on desktops and tablets. Make sure to swipe and zoom to admire the 3D effects
Playdeck's task section now features an "Add to Home Screen" option โ and a flying Yeti that moves based on your deviceโs orientation
Major has added a custom loading screen and the new Major Maze mini-game, where you can guide a rolling ball by tilting your phone
This media is not supported in your browser
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
This media is not supported in your browser
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
Please open Telegram to view this post
VIEW IN TELEGRAM
๐3
Chapi Dev Talks
IMG_20241118_172629_090.jpg
To Shade some light how big the csv files are.
๐คฏ
๐คฏ
๐คฏ19๐1
Chapi Dev Talks
To Shade some light how big the csv files are. ๐คฏ
Based on @frectonz recommendation to change it to sqlite it might get me a huge performance on processing part of the big csv, since file size doesn't matter for us but the speed is a huge gain for us.
plus there were a lot of duplication in the csv rows and we didn't notice that until today.
after a bit of experiment even tho the sqlite file size increased i think the query time is much faster than the normal looping so i think i am changing the approach a bit.
So the idea is to merge and process 4 csv file so i am going to change 3 csv to sqlite and looping through one csv and getting the file from other 3 might be the best approach i have at the moment.
just like
tho this doesn't help much for file downloading part but i think it's a good start for processing part.
anyways thanks @frectonz for the recommendation it super cool to have such community.
plus there were a lot of duplication in the csv rows and we didn't notice that until today.
after a bit of experiment even tho the sqlite file size increased i think the query time is much faster than the normal looping so i think i am changing the approach a bit.
So the idea is to merge and process 4 csv file so i am going to change 3 csv to sqlite and looping through one csv and getting the file from other 3 might be the best approach i have at the moment.
just like
for i in big_csv:
result_1 = cursor.execute(f"SELECT * FROM table WHERE id = {i}")
result_2 = cursor.execute(f"SELECT * FROM table WHERE id = {i}")
result_3 = cursor.execute(f"SELECT * FROM table WHERE id = {i}")
# do something with the result
tho this doesn't help much for file downloading part but i think it's a good start for processing part.
anyways thanks @frectonz for the recommendation it super cool to have such community.
โก21๐1
Forwarded from Hacker News
Show HN: Embed an SQLite database in your PostgreSQL table (Score: 150+ in 11 hours)
Link: https://readhacker.news/s/6icWC
Comments: https://readhacker.news/c/6icWC
pglite-fusion is a PostgreSQL extension that allows you to embed SQLite databases into your PostgreSQL tables by enabling the creation of columns with the `SQLITE` type. This means every row in the table can have an embedded SQLite database.
In addition to the PostgreSQL `SQLITE` type, pglite-fusion provides the `query_sqlite`` function for querying SQLite databases and the `execute_sqlite` function for updating them. Additional functions are listed in the projectโs README.
The pglite-fusion extension is written in Rust using the pgrx framework [1].
----
Implementation Details
The PostgreSQL `SQLITE` type is stored as a CBOR-encoded `Vec<u8>`. When a query is made, this `Vec<u8>` is written to a random file in the `/tmp` directory. SQLite then loads the file, performs the query, and returns the result as a table containing a single row with an array of JSON-encoded values.
The `execute_sqlite` function follows a similar process. However, instead of returning query results, it returns the contents of the SQLite file (stored in `/tmp`) as a new `SQLITE` instance.
[1] https://github.com/pgcentralfoundation/pgrx
Link: https://readhacker.news/s/6icWC
Comments: https://readhacker.news/c/6icWC
pglite-fusion is a PostgreSQL extension that allows you to embed SQLite databases into your PostgreSQL tables by enabling the creation of columns with the `SQLITE` type. This means every row in the table can have an embedded SQLite database.
In addition to the PostgreSQL `SQLITE` type, pglite-fusion provides the `query_sqlite`` function for querying SQLite databases and the `execute_sqlite` function for updating them. Additional functions are listed in the projectโs README.
The pglite-fusion extension is written in Rust using the pgrx framework [1].
----
Implementation Details
The PostgreSQL `SQLITE` type is stored as a CBOR-encoded `Vec<u8>`. When a query is made, this `Vec<u8>` is written to a random file in the `/tmp` directory. SQLite then loads the file, performs the query, and returns the result as a table containing a single row with an array of JSON-encoded values.
The `execute_sqlite` function follows a similar process. However, instead of returning query results, it returns the contents of the SQLite file (stored in `/tmp`) as a new `SQLITE` instance.
[1] https://github.com/pgcentralfoundation/pgrx
GitHub
GitHub - frectonz/pglite-fusion: Embed an SQLite database in your PostgreSQL table. AKA multitenancy has been solved.
Embed an SQLite database in your PostgreSQL table. AKA multitenancy has been solved. - frectonz/pglite-fusion
โก20๐ฅ6โค2๐1
This is not related to tech at all but I urge you to find couple of minutes and read the below story.
https://www-bbc-com.cdn.ampproject.org/c/s/www.bbc.com/amharic/articles/c2e714vekk1o.amp
I really am out of word at this point. May God help us and follow the right path.
Out kids might live in this country at least for most people and do we really give this to our kids.
Hopefully our generation will bring the good out of humanity.
Thank you for all of you who reads this. ๐
https://www-bbc-com.cdn.ampproject.org/c/s/www.bbc.com/amharic/articles/c2e714vekk1o.amp
I really am out of word at this point. May God help us and follow the right path.
Out kids might live in this country at least for most people and do we really give this to our kids.
Hopefully our generation will bring the good out of humanity.
Thank you for all of you who reads this. ๐
BBC News แ แแญแ
โแฐแ แแแ แแ แแแ แจแซแ แแแฅแ แแ แญโ - แจแแ แ แแธแ แแ แ แจแ แแ แแแต - BBC News แ แแญแ
แแแฆแต 2015 แ.แ. แจแฐแแธแแ แจแแ แ แแธแ แ แแ แฅแแแฝแ แซแตแแฃแฃ แแแแแชแซแ แจแแ แจแแแแ แตแญแแต แแ แญแข แแแค แจแ แแต แแแต แจแแแแ
แ แแต แฐแ แแ แซแณแแแปแธแแ แแแต แ แฐแแแจแฐ แแแแแชแซ แแ แแขแขแฒ แ แแญแแญ แฐแแแซแแฝแข แจแขแขแฒ แแญ แ แแ แซแต แแแ แซแ แแญแณแค แตแแฐแ แแแฝแ แต แแแแต แฅแ แตแแณแแแปแธแ แแ แ แแแต แ แแญแแญ แฐแญแซแแฝแข แจแ แแแ แแป แจแแฃแฝแ แต แแแแตแ แ แตแแตแ แ แญแแโฆ
๐ญ24
Me going out to office from home(aka around semit fiyel bet)
Me arrived at the office (aka 22)
Me go upstairs 4 floors fyi no lift
Me realising I forgot my office key Infront of the door ๐
Me now drinking coffee eventho I have to work trying to decide what to do ๐๐(aka anqi coffee)
Me arrived at the office (aka 22)
Me go upstairs 4 floors fyi no lift
Me realising I forgot my office key Infront of the door ๐
Me now drinking coffee eventho I have to work trying to decide what to do ๐๐(aka anqi coffee)
๐41๐4๐ฅ1
Please open Telegram to view this post
VIEW IN TELEGRAM
YouTube
I Meet MrBeast To Break The Internet!!
Cristiano Ronaldo meets the biggest Youtuber and Jimmy answers the question everyone is asking: โAm I Gonna Beat You?โ. Big challenges to come on the 30th!!!
๐ฅ11๐4๐1๐คฏ1