AWS Lambda Abuse
When you deploy an endpoint that is open to the world, you open it not only for use but also for abuse.
AWS provides services to avoid common abuse methods, such as AWS Shield, which mitigates against DDoS. But it doesn't know what is and isn't abusive either.
Of course, if your Lambda feature is private, then you should use one of the API gateway security mechanisms to prevent abuse:
- IAM security
- API key security
- Custom security authorization
If one of them is present, the Lambda function can only be called by authorized users. Cool!
But what to do when your Lambda is open to the public?
Not much, unfortunately.
One of the factors that you might want to control is concurrency, or the number of simultaneous requests that are supported per account and per function. You are billed for each request plus the total memory allocation per request, so this is the unit that you want to control. Here is a pretty cool post about this. By the way, you can control this even at the Zappa settings level(!), check
In addition to the limits for each account and for Lambda calls, you can also control Lambda exposure by wrapping up calls to the API Gateway, and Create and use API Gateway Usage Plans.
Using API Gateway Limits to create usage plans per customer, you can control API and Lambda access to prevent uncontrolled account billing.
#aws
When you deploy an endpoint that is open to the world, you open it not only for use but also for abuse.
AWS provides services to avoid common abuse methods, such as AWS Shield, which mitigates against DDoS. But it doesn't know what is and isn't abusive either.
Of course, if your Lambda feature is private, then you should use one of the API gateway security mechanisms to prevent abuse:
- IAM security
- API key security
- Custom security authorization
If one of them is present, the Lambda function can only be called by authorized users. Cool!
But what to do when your Lambda is open to the public?
Not much, unfortunately.
One of the factors that you might want to control is concurrency, or the number of simultaneous requests that are supported per account and per function. You are billed for each request plus the total memory allocation per request, so this is the unit that you want to control. Here is a pretty cool post about this. By the way, you can control this even at the Zappa settings level(!), check
lambda_concurrency.In addition to the limits for each account and for Lambda calls, you can also control Lambda exposure by wrapping up calls to the API Gateway, and Create and use API Gateway Usage Plans.
Using API Gateway Limits to create usage plans per customer, you can control API and Lambda access to prevent uncontrolled account billing.
#aws
Medium
The Everything Guide to Lambda Throttling, Reserved Concurrency, and Execution Limits
What your need to know, to know, that you know a little somethin-somethin…
Future of data science is data engineering
One aspect of data science that’s often over-emphasized is model tuning.
It’s very rare that focus of a data scientist will be on making a model 1% better. Typically it’s much more important to get a "good enough" model out the door and in front of users. The "good enough" model in production is 100x better than +5-10% more performant model in the jupyter notebook. Which is why software engineering and deployment skills are increasingly growing in importance over model tuning.
#ds #big_data
One aspect of data science that’s often over-emphasized is model tuning.
It’s very rare that focus of a data scientist will be on making a model 1% better. Typically it’s much more important to get a "good enough" model out the door and in front of users. The "good enough" model in production is 100x better than +5-10% more performant model in the jupyter notebook. Which is why software engineering and deployment skills are increasingly growing in importance over model tuning.
#ds #big_data
How Python Can Help You Live a Better Life
And more funny titles for your posts you can find here
Will definitely use it on my blog :)
#usefullinks
And more funny titles for your posts you can find here
Will definitely use it on my blog :)
#usefullinks
Portent
Portent's Content Idea Generator - Instant Blog Topic Inspiration
Frustrated by the same old titles? Get creative with Portent's Content Idea Generator. Certain to knock writer's block down, enter a keyword and give it a spin.
The clear sign of a good interview is a slight feeling of hatred on both sides.
The application runs on Google AI algorithms. The neural network answers user’ questions with quotes from books. It can handle abstract questions like "what is the meaning of life?", check it out
https://books.google.com/talktobooks/.
#usefullinks
https://books.google.com/talktobooks/.
#usefullinks
Google
Talk to Books
Talk to Books was a new way to explore ideas and discover books. Select one of
the samples to view its archived search results.
the samples to view its archived search results.
Over time, you may have a dozen copies of the same file lying in different corners of your system. The best idea is to track them down and eliminate them before they gain control of your hard drive.
FSlint is a utility to find and clean up various lint forms on the file system, empty directories, bad IDs and even redundant temp files, duplicate files and broken symlinks. I think it have only Linux support.
FSlint is a utility to find and clean up various lint forms on the file system, empty directories, bad IDs and even redundant temp files, duplicate files and broken symlinks. I think it have only Linux support.
$ sudo apt install fslint
www.pixelbeat.org
FSlint - Duplicate file finder for linux
A toolkit to find lint on a filesystem
ETL vs ELT
ETL(Extract Transform Load) is a popular data processing paradigm in many popular data warehousing. Essentially we extract data from a source or sources, clean it up and convert it into the structured information we need and upload it to a target database, data warehouse or data lake.
Currently there is some movement from ETL to ELT, when the transformation takes place inside the data warehouse and not up front.
As it seems to me, this as well as all approaches and tools of data managment come from lack of knowledge of companies about their data. Because I know that traditionally there was a lot of planning and rigor that had to go into loading the data into the data warehouse to make it accessible for other people. Then there are changes in the format of the input data, then the format of the output structure, etc.
Tools such as snowflake, AWS redshift allow you to create an abstraction layer over the loaded data (even unstructured) to give a simple SQL API over the data and forget about the letter T.
#big_data
ETL(Extract Transform Load) is a popular data processing paradigm in many popular data warehousing. Essentially we extract data from a source or sources, clean it up and convert it into the structured information we need and upload it to a target database, data warehouse or data lake.
Currently there is some movement from ETL to ELT, when the transformation takes place inside the data warehouse and not up front.
As it seems to me, this as well as all approaches and tools of data managment come from lack of knowledge of companies about their data. Because I know that traditionally there was a lot of planning and rigor that had to go into loading the data into the data warehouse to make it accessible for other people. Then there are changes in the format of the input data, then the format of the output structure, etc.
Tools such as snowflake, AWS redshift allow you to create an abstraction layer over the loaded data (even unstructured) to give a simple SQL API over the data and forget about the letter T.
#big_data
Snowflake
The Snowflake AI Data Cloud - Mobilize Data, Apps, and AI
Snowflake enables organizations to collaborate, build AI-powered data apps, and unlock data insights—all within a secure and scalable AI Data Cloud.
Ask stupid questions
You don't want to be the stupid guy in the room, me either. That is why you are scared of asking questions in case they are dumb.
But there is always a sense in nonsense. Start by asking stupid questions and they would lead you to sensible questions.
One who asks is a fool for a minute, one who fail to ask is a fool forever. But try to ask Google first and make sure you end up with structured questions to not annoy people.
#dev #soft_skills
You don't want to be the stupid guy in the room, me either. That is why you are scared of asking questions in case they are dumb.
But there is always a sense in nonsense. Start by asking stupid questions and they would lead you to sensible questions.
One who asks is a fool for a minute, one who fail to ask is a fool forever. But try to ask Google first and make sure you end up with structured questions to not annoy people.
#dev #soft_skills
I want to be an optimist like Trump, he constantly saying so much positive adjectives that I’ve never used in my life: great, incredible, tremendous, successful, classy, winning...
Somebody should do analytics on his speeches, it will be an incredible job
Somebody should do analytics on his speeches, it will be an incredible job
If I can’t call a moron moron on github, then what is open in open source?
https://medium.com/@catamphetamine/how-github-blocked-me-and-all-my-libraries-c32c61f061d3
https://medium.com/@catamphetamine/how-github-blocked-me-and-all-my-libraries-c32c61f061d3
Optional arguments MUST use keywords (Python 3), the quick explanation of part of PEP-3102
https://luminousmen.com/post/optional-arguments-must-use-keywords
https://luminousmen.com/post/optional-arguments-must-use-keywords
Blog | iamluminousmen
Optional arguments MUST use keywords (Python3)
Optional arguments MUST use keywords (Python 3), the quick explanation of part of PEP-3102
I couldn't resist sharing it with you guys. Learn a strange approach to sorting a huge array on a device with limited memory using network latency. How's that?
https://stackoverflow.com/questions/12748246/sorting-1-million-8-digit-numbers-in-1-mb-of-ram/13000176#13000176
#stuff
https://stackoverflow.com/questions/12748246/sorting-1-million-8-digit-numbers-in-1-mb-of-ram/13000176#13000176
#stuff
Stack Overflow
Sorting 1 million 8-decimal-digit numbers with 1 MB of RAM
I have a computer with 1 MB of RAM and no other local storage. I must use it to accept 1 million 8-digit decimal numbers over a TCP connection, sort them, and then send the sorted list out over ano...
This media is not supported in your browser
VIEW IN TELEGRAM
Fixing bugs in code
Descriptive statistics will teach you the basic concepts used to describe the data sample
https://luminousmen.com/post/descriptive-and-inferential-statistics
https://luminousmen.com/post/descriptive-and-inferential-statistics
Blog | iamluminousmen
Data Science. Descriptive and Inferential Statistics
Discover the foundations of Descriptive and Inferential Statistics in Data Science. Learn how to analyze data samples effectively. Start understanding your data today!
.pth files
Most likely, few people in general know why these files are needed even less need to know about it.
But sometimes it happens - you need to make sure that on startup python has all the directories you want in its search path automatically(I mean in
In your .pth file list all the directory paths that you want python to consider on startup, one after the other. Save a file with any name but with the
While it solves the problem, this solution is not ideal because those changes would be global for the system(but if you're creating new OS distribution it may solve your problems better). A better solution is to either use
#python
Most likely, few people in general know why these files are needed even less need to know about it.
But sometimes it happens - you need to make sure that on startup python has all the directories you want in its search path automatically(I mean in
sys.path). For such a problem python has its own built-in solution called .pth files.In your .pth file list all the directory paths that you want python to consider on startup, one after the other. Save a file with any name but with the
.pth extension in the "site-packages" folder of your interpreter.While it solves the problem, this solution is not ideal because those changes would be global for the system(but if you're creating new OS distribution it may solve your problems better). A better solution is to either use
sys.path.append or use PYTHONPATH variable or even better solution - use docker.#python
ML system basic framework
I have written a lot about basic knowledge of statistics and DS for data engineers (https://luminousmen.com/?tag=data_science) and it seems to me that this knowledge is necessary for everyone who wants to develop as MLE further. In addition to helping you understand and optimize what comes out of data scientists, this knowledge will also help at a higher level.
There is such a part of the knowledge as Machine Learning System Design, I do not know if it is any part of the discipline or not, but in any case it will soon become so. So, a lot of activity at this level requires some kind of understanding of the knowledge that I am writing about.
In the most general form, activities can be divided into the following steps(it may be considered as a basic framework):
1. Requirements clarification
2. Design a ML system
3. Data pipelines
4. Modeling
5. Monitoring the model performance
6. Serving & System deployment
7. Scaling
Many questions arise when designing a system:
Data exploration - what does the data look like?
What's our goal? CTR? Do we need to automatically check the performance of the model? A/B tests?
How different features relate with the target? What's the type of our feathers?
Is the data balanced? If not do we need to oversample/undersample?
Is there a missing value?
How to select a model?
Many questions, many tradeoffs...
#ml #architecture #system_design
I have written a lot about basic knowledge of statistics and DS for data engineers (https://luminousmen.com/?tag=data_science) and it seems to me that this knowledge is necessary for everyone who wants to develop as MLE further. In addition to helping you understand and optimize what comes out of data scientists, this knowledge will also help at a higher level.
There is such a part of the knowledge as Machine Learning System Design, I do not know if it is any part of the discipline or not, but in any case it will soon become so. So, a lot of activity at this level requires some kind of understanding of the knowledge that I am writing about.
In the most general form, activities can be divided into the following steps(it may be considered as a basic framework):
1. Requirements clarification
2. Design a ML system
3. Data pipelines
4. Modeling
5. Monitoring the model performance
6. Serving & System deployment
7. Scaling
Many questions arise when designing a system:
Data exploration - what does the data look like?
What's our goal? CTR? Do we need to automatically check the performance of the model? A/B tests?
How different features relate with the target? What's the type of our feathers?
Is the data balanced? If not do we need to oversample/undersample?
Is there a missing value?
How to select a model?
Many questions, many tradeoffs...
#ml #architecture #system_design
Blog | iamluminousmen
Blog | luminousmen - mostly Python, Data Engineering and Machine Learning
helping robots conquer the earth and trying not to increase entropy using Python, Data Engineering, Machine Learning