How to know that your machine learning problem is hopeless?
Very interesting question on stack exchange:
How do you know that your data actually is hopeless and all the fancy models wouldn't do you any more good than predicting the average outcome for all cases or some other trivial solution?
https://stats.stackexchange.com/questions/222179/how-to-know-that-your-machine-learning-problem-is-hopeless
#ml
Very interesting question on stack exchange:
How do you know that your data actually is hopeless and all the fancy models wouldn't do you any more good than predicting the average outcome for all cases or some other trivial solution?
https://stats.stackexchange.com/questions/222179/how-to-know-that-your-machine-learning-problem-is-hopeless
#ml
Apple believes that it is easier to teach an expert in the field to manage well than to train a manager to be an expert. Sounds logical to me.
https://hbr.org/2020/11/how-apple-is-organized-for-innovation
https://hbr.org/2020/11/how-apple-is-organized-for-innovation
Harvard Business Review
How Apple Is Organized for Innovation
When Steve Jobs returned to Apple, in 1997, it had a conventional structure for a company of its size and scope. It was divided into business units, each with its own P&L responsibilities. Believing that conventional management had stifled innovation, Jobs…
Lessons learned from writing the book/1
So I wrote a book.
Writing a book is fucking hard, it is hard work, especially when you are not Stephen King. It is even harder when the publisher has a hard deadline. Fortunately, I did not have such conditions — I published the book myself and did the whole process from beginning to end. But I spent some time digging through Reddit threads dedicated to authorship for publishers like Packt and O'Reilly. And made some conclusions which I want to share.
Let's start with the most interesting thing — it's unlikely that you'll be able to make money on a book. But if you work with a publisher, the book is paid in advance and you get a commission.
But no matter how you publish, with or without a publisher — you will spend a lot of time on it, far too much. If you convert the time spent on creating a concept, R&D, code writing and testing, design, formatting, publishing, and writing the material itself into hourly wages, you will realize that you "worked" at a loss.
And this is well understood not only by me but by many authors when they agree on writing a book. The motivation is as follows:
- New experience
- Prestige and opportunity to add the word "author" to your LinkedIn profile, i.e. personal brand.
- 4 upper levels of the Maslow's hierarchy of needs
- "If you are doing something good, do not do it for free"
My motivation was the same.
Technical literature has a very small audience, and there is no need to have illusions about the number of sales. After all, I am a modest engineer who does not write bestsellers. Judging by my Reddit research, many authors spend more money on R&D than they end up earning. At least, on their first book.
amazon
leanpub
gumroad
#stuff
So I wrote a book.
Writing a book is fucking hard, it is hard work, especially when you are not Stephen King. It is even harder when the publisher has a hard deadline. Fortunately, I did not have such conditions — I published the book myself and did the whole process from beginning to end. But I spent some time digging through Reddit threads dedicated to authorship for publishers like Packt and O'Reilly. And made some conclusions which I want to share.
Let's start with the most interesting thing — it's unlikely that you'll be able to make money on a book. But if you work with a publisher, the book is paid in advance and you get a commission.
But no matter how you publish, with or without a publisher — you will spend a lot of time on it, far too much. If you convert the time spent on creating a concept, R&D, code writing and testing, design, formatting, publishing, and writing the material itself into hourly wages, you will realize that you "worked" at a loss.
And this is well understood not only by me but by many authors when they agree on writing a book. The motivation is as follows:
- New experience
- Prestige and opportunity to add the word "author" to your LinkedIn profile, i.e. personal brand.
- 4 upper levels of the Maslow's hierarchy of needs
- "If you are doing something good, do not do it for free"
My motivation was the same.
Technical literature has a very small audience, and there is no need to have illusions about the number of sales. After all, I am a modest engineer who does not write bestsellers. Judging by my Reddit research, many authors spend more money on R&D than they end up earning. At least, on their first book.
amazon
leanpub
gumroad
#stuff
Amazon
Asynchronous programming
Lessons learned from writing the book/2
The next point that is important to realize when you think about writing a book is that you need skills.
The skills not of what you're going to write about (although it goes without saying), but the skills of writing and presenting the material.
I do not have such skills — so writing was very hard for me, despite the fact that most of the material was already somehow more or less ready — taken from my own posts. It's hard to read the same thing for the fiftieth time and try to make clear and consistent sentences so it can be more or less easy to read. I even don't mention English.
It is a book about asynchronous programming. The topic I chose at random — I can't say that I write asynchronous code every day and know everything about this topic, but I think I have a good understanding of the concepts and had experience writing such applications. And it seemed to me that few people understand and write about this topic at the concept level. In fact, I decided to help people like me — to solve questions that sometimes arise in my head. Also, most of the material was already written in my blog which made the whole process a bit easier.
Technical literature, in general, is difficult to write properly. It is necessary to understand roughly the knowledge of the potential reader. You need to choose the right terms and use them consistently. And you shouldn't go far from the topic (which I did not really succeed). And you have to write not just clearly but structurally correct — approach the topic/concept from the right side, move smoothly from one chapter to another, and draw conclusions, even the obvious ones.
The logic here is very simple. The material should be the one you want to read and advise your friends, that you as an engineer buy in the "working" library. This requires not only a thorough knowledge of the topic but also the right approach to deliver it in such a way that the reader understands the material and does not die of boredom.
amazon
leanpub
gumroad
#stuff
The next point that is important to realize when you think about writing a book is that you need skills.
The skills not of what you're going to write about (although it goes without saying), but the skills of writing and presenting the material.
I do not have such skills — so writing was very hard for me, despite the fact that most of the material was already somehow more or less ready — taken from my own posts. It's hard to read the same thing for the fiftieth time and try to make clear and consistent sentences so it can be more or less easy to read. I even don't mention English.
It is a book about asynchronous programming. The topic I chose at random — I can't say that I write asynchronous code every day and know everything about this topic, but I think I have a good understanding of the concepts and had experience writing such applications. And it seemed to me that few people understand and write about this topic at the concept level. In fact, I decided to help people like me — to solve questions that sometimes arise in my head. Also, most of the material was already written in my blog which made the whole process a bit easier.
Technical literature, in general, is difficult to write properly. It is necessary to understand roughly the knowledge of the potential reader. You need to choose the right terms and use them consistently. And you shouldn't go far from the topic (which I did not really succeed). And you have to write not just clearly but structurally correct — approach the topic/concept from the right side, move smoothly from one chapter to another, and draw conclusions, even the obvious ones.
The logic here is very simple. The material should be the one you want to read and advise your friends, that you as an engineer buy in the "working" library. This requires not only a thorough knowledge of the topic but also the right approach to deliver it in such a way that the reader understands the material and does not die of boredom.
amazon
leanpub
gumroad
#stuff
Amazon
Asynchronous programming
Lessons learned from writing the book/3
The author's work does not end after all chapters have been written. It is in the author's interest to help in R&D, correct formatting, participate in book marketing, and fill in product pages and descriptions.
Self-publication is much more complicated than a traditional publication. No one provides you with an editor and design team, so you are responsible for the whole project. If you happen to have friends or colleagues who can help you with any part of the work, it will certainly make things a little easier. But self-publishing still requires a lot of work and sometimes investment(in design, formatting, pictures, etc).
For me, formatting the book was a nightmare — I couldn't find decent tools at all. And I wasn't just looking for free ones. I went through a cycle from latex, pandoc, designer, calibre, google docs, Kindle Create, iBooks Author, and several others.
As a result, I wrote and format everything in google docs and then moved it to Kindle Create for Amazon publication.
The biggest problem with all those tools is code formatting — in google docs I managed to write it with widgets. But in all other places, you have options — using pictures or writing a heap of CSS.
I stick with images and it looks disgraceful, but at least it does the job on all formats and sizes. In Kindle Create even text cannot be selected!
After I've finished the book I found out the better way to format and write ebooks. And I would recommend using leanpub.com for that. Not only they provide you with great tools for creating ebooks, connecting authors with readers, managing sales, landing pages, and the like, they are also heavily involved in advancing the industry with new standards such as Markua (a book-targeted, enhanced flavor of MarkDown). And it even can generate all required formats(pdf, epub, mobi) and it's free!
amazon
leanpub
gumroad
#stuff
The author's work does not end after all chapters have been written. It is in the author's interest to help in R&D, correct formatting, participate in book marketing, and fill in product pages and descriptions.
Self-publication is much more complicated than a traditional publication. No one provides you with an editor and design team, so you are responsible for the whole project. If you happen to have friends or colleagues who can help you with any part of the work, it will certainly make things a little easier. But self-publishing still requires a lot of work and sometimes investment(in design, formatting, pictures, etc).
For me, formatting the book was a nightmare — I couldn't find decent tools at all. And I wasn't just looking for free ones. I went through a cycle from latex, pandoc, designer, calibre, google docs, Kindle Create, iBooks Author, and several others.
As a result, I wrote and format everything in google docs and then moved it to Kindle Create for Amazon publication.
The biggest problem with all those tools is code formatting — in google docs I managed to write it with widgets. But in all other places, you have options — using pictures or writing a heap of CSS.
I stick with images and it looks disgraceful, but at least it does the job on all formats and sizes. In Kindle Create even text cannot be selected!
After I've finished the book I found out the better way to format and write ebooks. And I would recommend using leanpub.com for that. Not only they provide you with great tools for creating ebooks, connecting authors with readers, managing sales, landing pages, and the like, they are also heavily involved in advancing the industry with new standards such as Markua (a book-targeted, enhanced flavor of MarkDown). And it even can generate all required formats(pdf, epub, mobi) and it's free!
amazon
leanpub
gumroad
#stuff
Amazon
Asynchronous programming
Lessons learned from writing the book/4
In the end, I published the book in pdf and epub formats on several websites — amazon, learnpub, gumroad. Amazon is the easiest resource for publication but the royalties are very low — the author gets only 30%.
In my opinion, leanpub is by far the most usable, well-crafted service on the market for ebook authors, and I couldn’t recommend them more. I like learnpub for its very straightforward approach. Easy to set up everything and it's pretty nice to look at the end result. And the royalties there is 80%! No, it's not an advertisement.
So the conclusion — I can't say that I'm very proud of my book, but I'm certainly not ashamed of it. Maybe one day I'll try to write it again, and maybe something closer to my topic of interest.
amazon
leanpub
gumroad
#stuff
In the end, I published the book in pdf and epub formats on several websites — amazon, learnpub, gumroad. Amazon is the easiest resource for publication but the royalties are very low — the author gets only 30%.
In my opinion, leanpub is by far the most usable, well-crafted service on the market for ebook authors, and I couldn’t recommend them more. I like learnpub for its very straightforward approach. Easy to set up everything and it's pretty nice to look at the end result. And the royalties there is 80%! No, it's not an advertisement.
So the conclusion — I can't say that I'm very proud of my book, but I'm certainly not ashamed of it. Maybe one day I'll try to write it again, and maybe something closer to my topic of interest.
amazon
leanpub
gumroad
#stuff
Amazon
Asynchronous programming
Would you like to write a book?
Anonymous Poll
31%
Yep, that's in my life plans
23%
Want to but scared
46%
Nope
I hope you are already overflowing with an unrelenting desire to finish this year. It obviously fell out of the normal distribution, hopefully next year will be more representative.
Thank you for reading and motivating me to write more. Happy New Year!✨
Thank you for reading and motivating me to write more. Happy New Year!✨
Those fucking zoom meetings are more nerve-racking than actual in-person meetings. It's almost illegal.
How to not look like a hostage at your next Zoom meeting? Here are some tips from my experience on the topic.
Link
How to not look like a hostage at your next Zoom meeting? Here are some tips from my experience on the topic.
Link
Blog | iamluminousmen
Guidelines for business meetings
Remote meetings have become an essential part of a workflow or even the only way of communication in various teams across the globe. How to make them effective?
Abstraction is not OOP
Once I was taught at university that there are only three principles of OOP: encapsulation, inheritance, and polymorphism. Times have changed and now another principle has been added to Wikipedia: abstraction. Now I hear it all the time at interviews, and so it drives me crazy.
Abstraction is a powerful programming tool. It is what allows us to build large systems and maintain control over them.
But abstraction is not an attribute of OOP alone, nor of programming in general. The process of creating abstraction levels extends to almost all areas of human knowledge.
Have you heard about Plato's idealism? We always deal with abstractions - models, and "the reality is not available to us". We can easily talk about complex mechanisms, such as a computer, an airplane turbine, or the human body, without remembering the individual details of these entities. We talk about ideas - ideal concepts not about erroneous implementations.
There have always been abstractions in programming. Splitting up the code into sub-programs. Combining sub-programs into modules and packages. Types? Same idea.
While encapsulation, polymorphism, and inheritance are the principles of OOP, abstraction is an element of OOP. It is above the principles of the OOP. The OOP principles implement abstraction. But this is more the philosophy, than a principle...
#dev
Once I was taught at university that there are only three principles of OOP: encapsulation, inheritance, and polymorphism. Times have changed and now another principle has been added to Wikipedia: abstraction. Now I hear it all the time at interviews, and so it drives me crazy.
Abstraction is a powerful programming tool. It is what allows us to build large systems and maintain control over them.
But abstraction is not an attribute of OOP alone, nor of programming in general. The process of creating abstraction levels extends to almost all areas of human knowledge.
Have you heard about Plato's idealism? We always deal with abstractions - models, and "the reality is not available to us". We can easily talk about complex mechanisms, such as a computer, an airplane turbine, or the human body, without remembering the individual details of these entities. We talk about ideas - ideal concepts not about erroneous implementations.
There have always been abstractions in programming. Splitting up the code into sub-programs. Combining sub-programs into modules and packages. Types? Same idea.
While encapsulation, polymorphism, and inheritance are the principles of OOP, abstraction is an element of OOP. It is above the principles of the OOP. The OOP principles implement abstraction. But this is more the philosophy, than a principle...
#dev
Open AI has shared the results of its research on DALL-E, a new neural network, a further extension of the GPT-3 idea using Transformers, but this time for generating images from text.
DALL-E neural network with 12 billion parameters, trained on picture-text pairs, which creates pictures from text descriptions. More here
✨Magic
DALL-E neural network with 12 billion parameters, trained on picture-text pairs, which creates pictures from text descriptions. More here
✨Magic
Soft skills thoughts
I feel like in western countries no one wants a person to know the depths of one particular technology. The knowledge and broad expertise of a whole stack or even one platform are valued much higher. That's why all of the cloud providers trying to include a service for any possible use case and such platforms as snowflake are flourish on the latest IPO.
Moreover, those countries is more and more inclined to soft skills — communication skills, teamwork, presentation skills, even sales skills. People here understood that there is no sense in chasing technologies. It is easier and cheaper to outsource those technologies to a platform where specially trained people do everything for them.
#soft_skills
I feel like in western countries no one wants a person to know the depths of one particular technology. The knowledge and broad expertise of a whole stack or even one platform are valued much higher. That's why all of the cloud providers trying to include a service for any possible use case and such platforms as snowflake are flourish on the latest IPO.
Moreover, those countries is more and more inclined to soft skills — communication skills, teamwork, presentation skills, even sales skills. People here understood that there is no sense in chasing technologies. It is easier and cheaper to outsource those technologies to a platform where specially trained people do everything for them.
#soft_skills
Gen C — The Covid Generation
'... And in case this isn’t forward-thinking enough for you, BofA notes that the next to come along is Gen C: The Covid generation.
"It is the generation that will have only ever known problem solving through fiscal stimulus and free government money potentially paving the way for universal basic income and health-care access," the strategists said. "Gen C will be unable to live without tech in every aspect of their lives" and "their avatars will protest virtually in the online Total Reality world with their friends on the latest cultural movement.'
Source
'... And in case this isn’t forward-thinking enough for you, BofA notes that the next to come along is Gen C: The Covid generation.
"It is the generation that will have only ever known problem solving through fiscal stimulus and free government money potentially paving the way for universal basic income and health-care access," the strategists said. "Gen C will be unable to live without tech in every aspect of their lives" and "their avatars will protest virtually in the online Total Reality world with their friends on the latest cultural movement.'
Source
Bloomberg.com
Zillennials Are Going to Change Investing Forever, BofA Says
Eating meat is out, flight-shaming is in. Gen Z is transforming the world and investors need to be prepared.
See how you should advertise your products properly. JetBrains downloaded 10,000,000 Jupyter notebooks from Github and made analytics on them. It seems to be nothing interesting and everything is absolutely clear, but there are things that I noticed.
▪️NumPy is the single most used data science library;
▪️NumPy and pandas is the most popular combination;
▪️Keras is wildly popular as a deep learning framework, although PyTorch has seen massive growth recently;
▪️Half of all these notebooks have fewer than four markdown cells;
▪️Over a third of these notebooks will may fail if you try to run the cells in order.
Unfortunately, if you ever work in data science, especially in ML, you will find that 1/3 is too little. It should be much more. Usually regular data scientists, engineers, papers from big conferences are at least partially unreproducible. Even some of the Titanic dataset' experts can not make them reproducible.
A good notebook is like a great conversation. Cells should be like sentences. You talk about something then add context and argument.
Link
▪️NumPy is the single most used data science library;
▪️NumPy and pandas is the most popular combination;
▪️Keras is wildly popular as a deep learning framework, although PyTorch has seen massive growth recently;
▪️Half of all these notebooks have fewer than four markdown cells;
▪️Over a third of these notebooks will may fail if you try to run the cells in order.
Unfortunately, if you ever work in data science, especially in ML, you will find that 1/3 is too little. It should be much more. Usually regular data scientists, engineers, papers from big conferences are at least partially unreproducible. Even some of the Titanic dataset' experts can not make them reproducible.
A good notebook is like a great conversation. Cells should be like sentences. You talk about something then add context and argument.
Link
The JetBrains Blog
We Downloaded 10,000,000 Jupyter Notebooks From Github – This Is What We Learned | The Datalore Blog
Here’s how we used the hundreds of thousands of publicly accessible repos on GitHub to learn more about the current state of data science.
NumPy
This is an open-source library, once separated from the SciPy project. NumPy is based on the LAPAC library, which is written in Fortran. Fortran-based implementation makes NumPy a fast library. And by virtue of the fact that it supports vector operations with multidimensional arrays, it is extremely convenient.
The non-Python alternative for NumPy is Matlab.
Besides support for multidimensional arrays, NumPy includes a set of packages for solving specialized problems, for example:
▪️
▪️
▪️
A guide to NumPy with many nice illustrations
#python
This is an open-source library, once separated from the SciPy project. NumPy is based on the LAPAC library, which is written in Fortran. Fortran-based implementation makes NumPy a fast library. And by virtue of the fact that it supports vector operations with multidimensional arrays, it is extremely convenient.
The non-Python alternative for NumPy is Matlab.
Besides support for multidimensional arrays, NumPy includes a set of packages for solving specialized problems, for example:
▪️
numpy.linalg - implements linear algebra operations;▪️
numpy.random - implements functions for dealing with random variables;▪️
numpy.fft - implements direct and inverse Fourier transform.A guide to NumPy with many nice illustrations
#python
Medium
NumPy Illustrated: The Visual Guide to NumPy
Brush up your NumPy or learn it from scratch
Interesting article about why Apache Kafka is so fast and popular. For those who work with the technology, read what Kafka has "under the hood". It explains a lot. Record batching, batch compression, buffered operations and other tricks. Zero-copy is really cool, never heard of that before.
Who does not like paywalls try open in a private tab.
Who does not like paywalls try open in a private tab.
Medium
Why Kafka Is so Fast
Discover the deliberate design decisions that have made Kafka the performance powerhouse it is today.
Testing and validation in ML
Testing is an important part of the software development cycle. Perhaps crucial to the delivery of a good product. As a software project grows, dealing with bugs and technical debt can consume all of the team time if it don't implement any testing approach. And overall software testing methodologies seem to me to be well understood.
Machine learning models bring a new set of complexities beyond traditional software. In particular, they depend on data in addition to code. As a result, testing methodologies for machine learning systems are less well understood and less widely applied in practice. Nowadays anyone can call a couple of functions on sklearn and proudly say he' s a data scientist, but to relate the results to the real world and validate that the model does reasonable things is quite difficult.
Here is a good talk about the importance of testing in ML, an overview of the types of testing available to ML practitioners, and recommendations on how you can start implementing more robust testing into ML projects.
#ml
Testing is an important part of the software development cycle. Perhaps crucial to the delivery of a good product. As a software project grows, dealing with bugs and technical debt can consume all of the team time if it don't implement any testing approach. And overall software testing methodologies seem to me to be well understood.
Machine learning models bring a new set of complexities beyond traditional software. In particular, they depend on data in addition to code. As a result, testing methodologies for machine learning systems are less well understood and less widely applied in practice. Nowadays anyone can call a couple of functions on sklearn and proudly say he' s a data scientist, but to relate the results to the real world and validate that the model does reasonable things is quite difficult.
Here is a good talk about the importance of testing in ML, an overview of the types of testing available to ML practitioners, and recommendations on how you can start implementing more robust testing into ML projects.
#ml
YouTube
PyData MTL: "Testing production machine learning systems" by Josh Tobin
"Testing production machine learning systems" by Josh Tobin
Testing is a critical part of the software development cycle. As your software project grows, dealing with bugs and regressions can consume your team if you do not take a principled approach to…
Testing is a critical part of the software development cycle. As your software project grows, dealing with bugs and regressions can consume your team if you do not take a principled approach to…
Martin Kleppmann(the guy behind Designing Data-Intensive applications book) has uploaded his new 8-lecture university course on distributed systems to the public.
Link
Link
Forwarded from Data Science, Machine Learning, AI & IOT
Huge repo of courses, resources covering Computer Science, AI, ML, Data SCIENCE, Maths and lot more
#beginner #machinelearning #datascience #github
@kdnuggets @datasciencechats
https://github.com/Developer-Y/cs-video-courses#math-for-computer-scientist
#beginner #machinelearning #datascience #github
@kdnuggets @datasciencechats
https://github.com/Developer-Y/cs-video-courses#math-for-computer-scientist
GitHub
GitHub - Developer-Y/cs-video-courses: List of Computer Science courses with video lectures.
List of Computer Science courses with video lectures. - Developer-Y/cs-video-courses