L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

AWS Lambda Abuse

When you deploy an endpoint that is open to the world, you open it not only for use but also for abuse.

AWS provides services to avoid common abuse methods, such as AWS Shield, which mitigates against DDoS. But it doesn't know what is and isn't abusive either.

Of course, if your Lambda feature is private, then you should use one of the API gateway security mechanisms to prevent abuse:

- IAM security
- API key security
- Custom security authorization

If one of them is present, the Lambda function can only be called by authorized users. Cool!

But what to do when your Lambda is open to the public?

Not much, unfortunately.

One of the factors that you might want to control is concurrency, or the number of simultaneous requests that are supported per account and per function. You are billed for each request plus the total memory allocation per request, so this is the unit that you want to control. Here is a pretty cool post about this. By the way, you can control this even at the Zappa settings level(!), check lambda_concurrency.

In addition to the limits for each account and for Lambda calls, you can also control Lambda exposure by wrapping up calls to the API Gateway, and Create and use API Gateway Usage Plans.

Using API Gateway Limits to create usage plans per customer, you can control API and Lambda access to prevent uncontrolled account billing.

#aws

Medium

The Everything Guide to Lambda Throttling, Reserved Concurrency, and Execution Limits

What your need to know, to know, that you know a little somethin-somethin…

133 viewsedited 13:09

👍 2

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

This media is not supported in your browser

VIEW IN TELEGRAM

#wednesday

127 views09:12

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Future of data science is data engineering

One aspect of data science that’s often over-emphasized is model tuning.

It’s very rare that focus of a data scientist will be on making a model 1% better. Typically it’s much more important to get a "good enough" model out the door and in front of users. The "good enough" model in production is 100x better than +5-10% more performant model in the jupyter notebook. Which is why software engineering and deployment skills are increasingly growing in importance over model tuning.

#ds #big_data

143 viewsedited 13:09

👍 4

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

It is always darkest before the dawn. Just wait

142 views09:12

👍 3

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

How Python Can Help You Live a Better Life

And more funny titles for your posts you can find here

Will definitely use it on my blog :)

#usefullinks

Portent

Portent's Content Idea Generator - Instant Blog Topic Inspiration

Frustrated by the same old titles? Get creative with Portent's Content Idea Generator. Certain to knock writer's block down, enter a keyword and give it a spin.

153 views13:09

👍 2

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

The clear sign of a good interview is a slight feeling of hatred on both sides.

129 views13:09

👍

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

The application runs on Google AI algorithms. The neural network answers user’ questions with quotes from books. It can handle abstract questions like "what is the meaning of life?", check it out

https://books.google.com/talktobooks/.

#usefullinks

Google

Talk to Books

Talk to Books was a new way to explore ideas and discover books. Select one of
the samples to view its archived search results.

138 viewsedited 13:09

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Over time, you may have a dozen copies of the same file lying in different corners of your system. The best idea is to track them down and eliminate them before they gain control of your hard drive.

FSlint is a utility to find and clean up various lint forms on the file system, empty directories, bad IDs and even redundant temp files, duplicate files and broken symlinks. I think it have only Linux support.

$ sudo apt install fslint

www.pixelbeat.org

FSlint - Duplicate file finder for linux

A toolkit to find lint on a filesystem

130 views13:09

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

ETL vs ELT

ETL(Extract Transform Load) is a popular data processing paradigm in many popular data warehousing. Essentially we extract data from a source or sources, clean it up and convert it into the structured information we need and upload it to a target database, data warehouse or data lake.

Currently there is some movement from ETL to ELT, when the transformation takes place inside the data warehouse and not up front.

As it seems to me, this as well as all approaches and tools of data managment come from lack of knowledge of companies about their data. Because I know that traditionally there was a lot of planning and rigor that had to go into loading the data into the data warehouse to make it accessible for other people. Then there are changes in the format of the input data, then the format of the output structure, etc.

Tools such as snowflake, AWS redshift allow you to create an abstraction layer over the loaded data (even unstructured) to give a simple SQL API over the data and forget about the letter T.

#big_data

Snowflake

The Snowflake AI Data Cloud - Mobilize Data, Apps, and AI

Snowflake enables organizations to collaborate, build AI-powered data apps, and unlock data insights—all within a secure and scalable AI Data Cloud.

133 viewsedited 13:09

👍 3

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

This media is not supported in your browser

VIEW IN TELEGRAM

#wednesday

159 views09:12

👍

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Ask stupid questions

You don't want to be the stupid guy in the room, me either. That is why you are scared of asking questions in case they are dumb.

But there is always a sense in nonsense. Start by asking stupid questions and they would lead you to sensible questions.

One who asks is a fool for a minute, one who fail to ask is a fool forever. But try to ask Google first and make sure you end up with structured questions to not annoy people.

#dev #soft_skills

142 viewsedited 13:09

👍 5

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

I want to be an optimist like Trump, he constantly saying so much positive adjectives that I’ve never used in my life: great, incredible, tremendous, successful, classy, winning...

Somebody should do analytics on his speeches, it will be an incredible job

128 views09:12

👍 2

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

If I can’t call a moron moron on github, then what is open in open source?

https://medium.com/@catamphetamine/how-github-blocked-me-and-all-my-libraries-c32c61f061d3

182 views13:09

👍 5

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Optional arguments MUST use keywords (Python 3), the quick explanation of part of PEP-3102

https://luminousmen.com/post/optional-arguments-must-use-keywords

Blog | iamluminousmen

Optional arguments MUST use keywords (Python3)

Optional arguments MUST use keywords (Python 3), the quick explanation of part of PEP-3102

124 views09:12

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

I couldn't resist sharing it with you guys. Learn a strange approach to sorting a huge array on a device with limited memory using network latency. How's that?

https://stackoverflow.com/questions/12748246/sorting-1-million-8-digit-numbers-in-1-mb-of-ram/13000176#13000176

#stuff

Stack Overflow

Sorting 1 million 8-decimal-digit numbers with 1 MB of RAM

I have a computer with 1 MB of RAM and no other local storage. I must use it to accept 1 million 8-digit decimal numbers over a TCP connection, sort them, and then send the sorted list out over ano...

155 viewsedited 13:09

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

This media is not supported in your browser

VIEW IN TELEGRAM

#wednesday

137 views09:12

👍 1

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

0:22

This media is not supported in your browser

VIEW IN TELEGRAM

Fixing bugs in code

163 views09:12

👍 4

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

If you're data scientist/analyst, engineer, manager, janitor... whatever. If you can solve a problem with a simple heuristic, do that. You will be surprised, but sometimes you don't need machine learning.

Please think about the problem first, not about solution.

#ml #ds #dev

144 viewsedited 13:09

👍 2

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

Descriptive statistics will teach you the basic concepts used to describe the data sample

https://luminousmen.com/post/descriptive-and-inferential-statistics

Blog | iamluminousmen

Data Science. Descriptive and Inferential Statistics

Discover the foundations of Descriptive and Inferential Statistics in Data Science. Learn how to analyze data samples effectively. Start understanding your data today!

164 views13:09

👍

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

.pth files

Most likely, few people in general know why these files are needed even less need to know about it.

But sometimes it happens - you need to make sure that on startup python has all the directories you want in its search path automatically(I mean in sys.path). For such a problem python has its own built-in solution called .pth files.

In your .pth file list all the directory paths that you want python to consider on startup, one after the other. Save a file with any name but with the .pth extension in the "site-packages" folder of your interpreter.

While it solves the problem, this solution is not ideal because those changes would be global for the system(but if you're creating new OS distribution it may solve your problems better). A better solution is to either use sys.path.append or use PYTHONPATH variable or even better solution - use docker.

#python

135 viewsedited 13:09

👍 3

L̶u̵m̶i̵n̷o̴u̶s̶m̶e̵n̵B̶l̵o̵g̵

ML system basic framework

I have written a lot about basic knowledge of statistics and DS for data engineers (https://luminousmen.com/?tag=data_science) and it seems to me that this knowledge is necessary for everyone who wants to develop as MLE further. In addition to helping you understand and optimize what comes out of data scientists, this knowledge will also help at a higher level.

There is such a part of the knowledge as Machine Learning System Design, I do not know if it is any part of the discipline or not, but in any case it will soon become so. So, a lot of activity at this level requires some kind of understanding of the knowledge that I am writing about.

In the most general form, activities can be divided into the following steps(it may be considered as a basic framework):

1. Requirements clarification
2. Design a ML system
3. Data pipelines
4. Modeling
5. Monitoring the model performance
6. Serving & System deployment
7. Scaling

Many questions arise when designing a system:
Data exploration - what does the data look like?
What's our goal? CTR? Do we need to automatically check the performance of the model? A/B tests?
How different features relate with the target? What's the type of our feathers?
Is the data balanced? If not do we need to oversample/undersample?
Is there a missing value?
How to select a model?

Many questions, many tradeoffs...

#ml #architecture #system_design

Blog | iamluminousmen

Blog | luminousmen - mostly Python, Data Engineering and Machine Learning

helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering, Machine Learning

137 viewsedited 13:09

👍 2

About

Blog

Apps

Platform