前有StanfordNLP,又发现 https://github.com/zalandoresearch/flair 不过现在对这种轮子有点免疫。看了一些源码觉得项目代码写得还是挺不错的,自己造轮子的朋友不妨一看,看得多才能造得好。
GitHub
GitHub - flairNLP/flair: A very simple framework for state-of-the-art Natural Language Processing (NLP)
A very simple framework for state-of-the-art Natural Language Processing (NLP) - flairNLP/flair
ignite,来自FAIR的PyTorch high-level api,昨晚玩了一下非常好用,感觉是有点像keras和tf的关系。https://github.com/pytorch/ignite
GitHub
GitHub - pytorch/ignite: High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently. - pytorch/ignite
Foundations of Data Science,一份来自MSR India的资料,作者是MSR India的DataScience Lead。看一眼,书质量非常高。https://www.cs.cornell.edu/jeh/book.pdf
一些生成模型的collections,TF2+Keras,货都在colab上。https://github.com/timsainb/tensorflow2-generative-models/
GitHub
GitHub - timsainb/tensorflow2-generative-models: Implementations of a number of generative models in Tensorflow 2. GAN, VAE, Seq2Seq…
Implementations of a number of generative models in Tensorflow 2. GAN, VAE, Seq2Seq, VAEGAN, GAIA, Spectrogram Inversion. Everything is self contained in a jupyter notebook for easy export to colab...
Sequence-Aware Recommender Systems 的一份Tutorial,之前在做实验的时候也发现Session Based 的RNN做推荐效果是相当好的,尤其是在典型的存在序列Session的场景,例如YouTube连续剧,短视频流等等。https://github.com/mquad/sars_tutorial
GitHub
GitHub - mquad/sars_tutorial: Repository for the tutorial on Sequence-Aware Recommender Systems held at TheWebConf 2019 and ACM…
Repository for the tutorial on Sequence-Aware Recommender Systems held at TheWebConf 2019 and ACM RecSys 2018 - GitHub - mquad/sars_tutorial: Repository for the tutorial on Sequence-Aware Recommend...
BAMBI 是一个在PyMC3上的Python高级api,如果你经常用Bayesian statistical model的话,可以一试。我只用过PyMC3,打算试试这个BAMBI,希望好用。https://github.com/bambinos/bambi
GitHub
GitHub - bambinos/bambi: BAyesian Model-Building Interface (Bambi) in Python.
BAyesian Model-Building Interface (Bambi) in Python. - bambinos/bambi
Catalyst 19.06rc2 把 TensorFlow 的依赖全去掉了,完全使用 PyTorch。新版本还没试用,不过把tf去掉倒是一个好消息。
link:https://catalyst-team.github.io/catalyst/index.html
Sergey的介绍:https://docs.google.com/presentation/d/1NQGWb53Kqm-f3hZ2JIoHjX-he3C39eOcSszZzp5o07U/edit#slide=id.p
link:https://catalyst-team.github.io/catalyst/index.html
Sergey的介绍:https://docs.google.com/presentation/d/1NQGWb53Kqm-f3hZ2JIoHjX-he3C39eOcSszZzp5o07U/edit#slide=id.p
Google Docs
Catalyst.RL
Catalyst.RL tl;dr Sergey Kolesnikov
如何管理ML实验结果和模型其实是一个老生常谈的问题,reddit这个帖子总结的一些工具还是不错的,下面的评论不少也值得一看。
https://old.reddit.com/r/MachineLearning/comments/bx0apm/d_how_do_you_manage_your_machine_learning/
https://old.reddit.com/r/MachineLearning/comments/bx0apm/d_how_do_you_manage_your_machine_learning/
reddit
r/MachineLearning - [D] How do you manage your machine learning experiments?
184 votes and 68 comments so far on Reddit
Forwarded from AirOnG
https://github.com/PacktPublishing/Hands-On-Data-Structures-and-Algorithms-with-Rust 使用Rust入手数据结构和算法 数据结构和算法是每种计算机语言都要面对的基础知识,而Rust由于独特的所有权问题,在实现数据结构和算法时需要一定技巧,也更能体会语言的独特性。这个repo保存了书里所有例子代码,可以用来入门,也可以用来查阅具体算法的写法。
GitHub
GitHub - PacktPublishing/Hands-On-Data-Structures-and-Algorithms-with-Rust: Hands-On Data Structures and Algorithms with Rust,…
Hands-On Data Structures and Algorithms with Rust, published by Packt - PacktPublishing/Hands-On-Data-Structures-and-Algorithms-with-Rust
最近在看一些NLP项目corpus的序列化部分,http://matthewrocklin.com/blog/work/2015/03/16/Fast-Serialization
文章有点老,实验部分尚可一看。
文章有点老,实验部分尚可一看。
Matthewrocklin
Efficiently Store Pandas DataFrames
Voila是一个新的Jupyter做可视化的插件,可以把notebook直接转换成standalone的web app。试了一下还是不错的,数据量大的情况有点卡。不过我自己现在都是更喜欢用plotly的Dash,更漂亮点,生成的HTML也更方便嵌入其他的文档说明页 like Python Sphinx。不过也算是多一个选择:https://blog.jupyter.org/a-gallery-of-voil%C3%A0-examples-a2ce7ef99130
Medium
A Gallery of Voilà Examples
Voilà is one of the latest addition to the Jupyter ecosystem, and can be used to turn notebooks into standalone applications and…
一份Data Visualization Style Guidelines的资源列表,作者收集挺精心的。https://medium.com/data-visualization-society/style-guidelines-92ebe166addc
这份excel里面有非常多的细节,包括如何选择合适的chart,style,甚至有的里面还有每一种颜色的使用场景,还是蛮有意思的。
https://docs.google.com/spreadsheets/d/1F1gm5QLXh3USC8ZFx_M9TXYxmD-X5JLDD0oJATRTuIE/edit#gid=1679646668
这份excel里面有非常多的细节,包括如何选择合适的chart,style,甚至有的里面还有每一种颜色的使用场景,还是蛮有意思的。
https://docs.google.com/spreadsheets/d/1F1gm5QLXh3USC8ZFx_M9TXYxmD-X5JLDD0oJATRTuIE/edit#gid=1679646668
Medium
What Are Data Visualization Style Guidelines?
Data visualization style guides are standards for formatting and designing representations of information.
👍1
今天在推上被一位朋友问到AutoML的入门资料,我想了一下之前看过第四范式的这篇Survey,他们一直在KDD Cup/NIPS上承办AutoML Challenge。这篇入门survey也是我看过的写得最好的,2018年11月提交,2019年1月最后一次revised,内容够新够全。https://arxiv.org/abs/1810.13306
AutoML的很多工作都是集中于超参数调节,虽然我觉得它很多时候没有CV/NLP方向那么生动,却还是有自己很独特的魅力,落地价值也很强。
AutoML的很多工作都是集中于超参数调节,虽然我觉得它很多时候没有CV/NLP方向那么生动,却还是有自己很独特的魅力,落地价值也很强。
arXiv.org
Automated Machine Learning: From Principles to Practices
Machine learning (ML) methods have been developing rapidly, but configuring and selecting proper methods to achieve a desired performance is increasingly difficult and tedious. To address this...
Chip Huyen是我非常喜欢的一个越南裔斯坦福的老师,产出博客和课程质量非常高,项目也都挺有趣。这是她的博客:https://huyenchip.com/
不过这次想分享的是她在推上写的关于ML eng/Data Scientist面试的一些琐碎,信息量很大,这条推看起来会一直更新下去,直到整理成书籍:https://twitter.com/chipro/status/1152077188985835521
以及每条推的评论部分也很值得一读
不过这次想分享的是她在推上写的关于ML eng/Data Scientist面试的一些琐碎,信息量很大,这条推看起来会一直更新下去,直到整理成书籍:https://twitter.com/chipro/status/1152077188985835521
以及每条推的评论部分也很值得一读
关于Pandas apply/groupby 并行老生常谈的问题,一直觉得dask不好用,需要转来转去,刚刚发现一个简单好用的工具。https://github.com/nalepae/pandarallel
GitHub
GitHub - nalepae/pandarallel: A simple and efficient tool to parallelize Pandas operations on all available CPUs
A simple and efficient tool to parallelize Pandas operations on all available CPUs - nalepae/pandarallel
RAdam + LookAhead 实验结果还是有点奇怪的,不是太明朗的感觉。一个用fastdoai的实现。https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d
Medium
New Deep Learning Optimizer, Ranger: Synergistic combination of RAdam + LookAhead for the best of both.
A new paper in part by the famed deep learning researcher Geoffrey Hinton introduces the LookAhead optimizer(“LookAhead optimizer: k steps…
上周在造一个CTR项目轮子的时候又系统回顾了一些非复杂DNN模型的hyper param optmization 的方法和工具,发现一个新的工具:Optuna https://github.com/pfnet/optuna
GitHub
GitHub - optuna/optuna: A hyperparameter optimization framework
A hyperparameter optimization framework. Contribute to optuna/optuna development by creating an account on GitHub.
最近在用一些非监督方法做降维的时候,发现在categorical feature有时候MCA比传统的PCA要好一些,(不过有时候先做target encoding再用普通的PCA也不错)。用了一段时间Prince,简单好用,性能不错。https://github.com/MaxHalford/Prince
GitHub
GitHub - MaxHalford/prince: :crown: Multivariate exploratory data analysis in Python — PCA, CA, MCA, MFA, FAMD, GPA
:crown: Multivariate exploratory data analysis in Python — PCA, CA, MCA, MFA, FAMD, GPA - MaxHalford/prince
晚上有一个朋友看到推送问我,对categorical feature 为什么要做target encoding。其实这比较取决于模型,不过对于tabular data常用的tree based model来说,OHE是比较差的,如果是用xgboost需要自己做target encoding,catBoost/lightGBM不需要,自带了。https://medium.com/data-design/visiting-categorical-features-and-encoding-in-decision-trees-53400fa65931
Medium
Visiting: Categorical Features and Encoding in Decision Trees
When you have categorical features and you are using decision trees, you often have a major issue: how to deal with categorical features?