Data Science Archive

基于 PyTorch 的high-level lib，很早以前看过，没注意已经是 PyTorch 官方 team 的 repo，可以关注一下。
https://github.com/pytorch/ignite

GitHub - pytorch/ignite: High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently. - pytorch/ignite

1.38K views小熊猫, 03:49

Data Science Archive

《Do Better ImageNet Models Transfer Better?》的第二版。
In v1, we used public checkpoints where the ResNet models were trained without regularizers, which is why they performed best in the fixed feature setting. In v2, we retrained everything. Surprisingly, for ImageNet training, the same hyperparameters work well for all models.
In v2, we show that regularization settings for ImageNet training matter a lot for transfer learning on fixed features. ImageNet accuracy now correlates with transfer acc in all settings.
https://arxiv.org/abs/1805.08974

1.46K views小熊猫, 03:52

Data Science Archive

MedicalTorch 升级到了v0.2，这是一个在 PyTorch 上专门用作医学图像的框架，没有仔细研究过，可能是医学图像和其他领域的图像处理有所不同。粗略看了一下代码里的 Model，提到了 segmentation using deep dilated convolutions
link: https://www.nature.com/articles/s41598-018-24304-3
transforms 里的函数有好多特殊的，像是一个高质量的项目，有待研究。
link：https://medicaltorch.readthedocs.io/en/stable/

Nature

Spinal cord gray matter segmentation using deep dilated convolutions

Scientific Reports - <ArticleTitle Language="En" xml:lang="en">Spinal cord gray matter segmentation using deep dilated...

1.77K views小熊猫, edited 04:02

Data Science Archive

pandas bokeh 一个半年前准备造的轮子被人先造了，不过这种轮子也是不少了。。。
link: https://github.com/PatrikHlobil/Pandas-Bokeh

GitHub

GitHub - PatrikHlobil/Pandas-Bokeh: Bokeh Plotting Backend for Pandas and GeoPandas

Bokeh Plotting Backend for Pandas and GeoPandas. Contribute to PatrikHlobil/Pandas-Bokeh development by creating an account on GitHub.

1.75K views小熊猫, 17:40

Data Science Archive

一份对 FM 比较不错的应用介绍，包括推荐搜索这样的典型应用，适合了解 FFM 和 FM。https://www.m3tech.blog/entry/2019/01/02/090000

エムスリーテックブログ

Factorization Machineの実装と数値検証 - エムスリーテックブログ

はじめにあけましておめでとうございます。エンジニアGの西場です(@m_nishiba)。AI・機械学習チームで自然言語処理や推薦システムの開発を行っています。 Gunosyのデータ分析ブログのDeepなFactorization Machinesの最新動向 (2018)を読んでFactorization Machin…

1.58K views小熊猫, 19:50

Data Science Archive

Parabel 的 Rust 高度并行实现。https://github.com/tomtung/parabel-rs
关于 Parabel：https://dl.acm.org/citation.cfm?doid=3178876.3185998
看起来是适合大规模分类问题，性能超群，留待日后研究。

GitHub

GitHub - tomtung/omikuji: An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification

An efficient implementation of Partitioned Label Trees & its variations for extreme multi-label classification - GitHub - tomtung/omikuji: An efficient implementation of Partitioned Label T...

1.73K views小熊猫, edited 19:53

Data Science Archive

2018年几个比较重要的数据集，自己用过 SQuAD2.0/CoQA/HotpotQA/TencentAI ML 质量都比较高
https://medium.com/syncedreview/2018-in-review-10-open-sourced-ai-datasets-696b3b49801f
还推荐 Tencent AI 前段时间发布的中文 embedding：https://ai.tencent.com/ailab/nlp/embedding.html

Medium

2018 In Review: 10 Open-Sourced AI Datasets

In a boon to AI researchers, the last year witnessed an unprecedented open-sourcing of large datasets by popular AI research projects.

1.93K views小熊猫, edited 19:57

Data Science Archive

来自Uber AI 的一个不错的轮子，玩了一天非常适合跑demo和验证，许多state of the art 的解决方案都可以先做验证。https://uber.github.io/ludwig/
blog介绍：https://eng.uber.com/introducing-ludwig/

1.65K views小熊猫, edited 08:59

Data Science Archive

DVC：做data science model管理的工具，大致原理是使用git和s3之类的进行联合存储。多人团队，跨多业务团队还是蛮有用的，上一次和其他队员一起刷Kaggle的时候用过一次体验不错。https://github.com/iterative/dvc

GitHub

GitHub - treeverse/dvc: 🦉 Data Versioning and ML Experiments

🦉 Data Versioning and ML Experiments. Contribute to treeverse/dvc development by creating an account on GitHub.

1.7K views小熊猫, edited 09:03

Data Science Archive

FAIR的ELF发布了ELF Go的新版，应该后面会继续发更多Go bot，https://facebook.ai/developers/tools/elf
ELF OpenGo：https://research.fb.com/facebook-open-sources-elf-opengo/
lecun的fb post：https://www.facebook.com/yann.lecun/posts/10155789997817143

1.82K views小熊猫, edited 03:11

Data Science Archive

早上试玩了一下JAX，前段时间有关注，昨天看Francois又在提到。简单来说就是Numpy+gradients，有XLA https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/g3doc/overview.md 加成的GPU加速。想实现一些底层框架的话也许是一个不错的选择。https://github.com/google/jax

2.03K views小熊猫, edited 03:19

Data Science Archive

前有StanfordNLP，又发现 https://github.com/zalandoresearch/flair 不过现在对这种轮子有点免疫。看了一些源码觉得项目代码写得还是挺不错的，自己造轮子的朋友不妨一看，看得多才能造得好。

GitHub

GitHub - flairNLP/flair: A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art Natural Language Processing (NLP) - flairNLP/flair

2.28K views小熊猫, 03:25

Data Science Archive

ignite，来自FAIR的PyTorch high-level api，昨晚玩了一下非常好用，感觉是有点像keras和tf的关系。https://github.com/pytorch/ignite

GitHub

GitHub - pytorch/ignite: High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently. - pytorch/ignite

2.94K views小熊猫, 02:18

Data Science Archive

一份spaCy的cheat sheet：http://datacamp-community-prod.s3.amazonaws.com/29aa28bf-570a-4965-8f54-d6a541ae4e06

2.15K views小熊猫, 18:15

Data Science Archive

还有一份CS229的Cheat Sheet：https://stanford.edu/~shervine/teaching/cs-229/

stanford.edu

Teaching - CS 229

Teaching page of Shervine Amidi, Adjunct Lecturer at Stanford University.

2.27K views小熊猫, 18:17

Data Science Archive

Foundations of Data Science，一份来自MSR India的资料，作者是MSR India的DataScience Lead。看一眼，书质量非常高。https://www.cs.cornell.edu/jeh/book.pdf

1.99K views小熊猫, 03:48

Data Science Archive

一些生成模型的collections，TF2+Keras，货都在colab上。https://github.com/timsainb/tensorflow2-generative-models/

GitHub

GitHub - timsainb/tensorflow2-generative-models: Implementations of a number of generative models in Tensorflow 2. GAN, VAE, Seq2Seq…

Implementations of a number of generative models in Tensorflow 2. GAN, VAE, Seq2Seq, VAEGAN, GAIA, Spectrogram Inversion. Everything is self contained in a jupyter notebook for easy export to colab...

2.09K views小熊猫, 04:10

Data Science Archive

Sequence-Aware Recommender Systems 的一份Tutorial，之前在做实验的时候也发现Session Based 的RNN做推荐效果是相当好的，尤其是在典型的存在序列Session的场景，例如YouTube连续剧，短视频流等等。https://github.com/mquad/sars_tutorial

GitHub

GitHub - mquad/sars_tutorial: Repository for the tutorial on Sequence-Aware Recommender Systems held at TheWebConf 2019 and ACM…

Repository for the tutorial on Sequence-Aware Recommender Systems held at TheWebConf 2019 and ACM RecSys 2018 - GitHub - mquad/sars_tutorial: Repository for the tutorial on Sequence-Aware Recommend...

2.18K views小熊猫, 04:12

Data Science Archive

BAMBI 是一个在PyMC3上的Python高级api，如果你经常用Bayesian statistical model的话，可以一试。我只用过PyMC3，打算试试这个BAMBI，希望好用。https://github.com/bambinos/bambi

GitHub

GitHub - bambinos/bambi: BAyesian Model-Building Interface (Bambi) in Python.

BAyesian Model-Building Interface (Bambi) in Python. - bambinos/bambi

2.39K views小熊猫, edited 01:57

Data Science Archive

Catalyst 19.06rc2 把 TensorFlow 的依赖全去掉了，完全使用 PyTorch。新版本还没试用，不过把tf去掉倒是一个好消息。
link：https://catalyst-team.github.io/catalyst/index.html
Sergey的介绍：https://docs.google.com/presentation/d/1NQGWb53Kqm-f3hZ2JIoHjX-he3C39eOcSszZzp5o07U/edit#slide=id.p

Google Docs

Catalyst.RL

Catalyst.RL tl;dr Sergey Kolesnikov

2.6K views小熊猫, 06:57

Data Science Archive

如何管理ML实验结果和模型其实是一个老生常谈的问题，reddit这个帖子总结的一些工具还是不错的，下面的评论不少也值得一看。
https://old.reddit.com/r/MachineLearning/comments/bx0apm/d_how_do_you_manage_your_machine_learning/

r/MachineLearning - [D] How do you manage your machine learning experiments?

184 votes and 68 comments so far on Reddit

2.57K views小熊猫, 02:14

About

Blog

Apps

Platform