Comments | anhinga_anhinga: 9 months since GPT-3 revolution

anhinga_anhinga

9 months since GPT-3 revolution

Feb 28, 2021 03:56

On May 28, 2020 OpenAI published the GPT-3 paper, "Language Models are Few-Shot Learners", https://arxiv.org/abs/2005.14165Read more... )

Leave a comment

Back to all threads

am March 6 2021, 00:40:30 UTC

> "efficient transformers", "vision transformers"

Кстати, в этой статье есть список ссылок на:
longformer, reformer, adaptive attention span,
compressive transformer, blockwise transformer,
BigBird, linformer..

anhinga_anhinga March 6 2021, 02:19:31 UTC

А, это полезно, спасибо! И то, что они по-русски делают, хотя и небольшую версию, это тоже хорошо...

Две статьи, которые мне показались в своё время особенно полезны:

обзор "Efficient Transformers: A Survey" https://arxiv.org/abs/2009.06732 (Google Research)

и

"Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention" https://arxiv.org/abs/2006.16236 (Switzerland + U.Washington)

am March 6 2021, 02:44:25 UTC

Спасибо! Интересный обзор "Efficient Transformers".
Не встречалось ли вам статей o transformers в моделировании движений, танцев и т.п.?
(Много статей о подобном применении GRU(/LSTM)+attention я уже встречал..)

anhinga_anhinga March 6 2021, 03:24:21 UTC

Не помню, чтобы такое мне попадалось. (Но это ничего не значит, конечно, я ведь не пробовал искать на эту тему, и я уже совсем не справляюсь с потоком статей, так что, я вижу небольшую часть того, что происходит.)

:-) Помню только анимированную визуализацию из исследования "BERTology Meets Biology: Interpreting Attention in Protein Language Models", которая создавала в уме зрителя/читателя ассоцияцию с чем-то таким :-) Я так чувствую, что изящность этой анимации и была причиной того, что я это исследование тогда заметил, и до сих пор помню:

https://twitter.com/RichardSocher/status/1278058096481333253

anhinga_anhinga March 6 2021, 03:27:40 UTC

Но да, Гугл поиск на

modeling movement dance with transformers

сразу показывает всякое:

"Learning to Generate Diverse Dance Motions with Transformer" https://arxiv.org/abs/2008.08171

etc...

am March 6 2021, 17:23:04 UTC

Да, спасибо. Поищу еще. Пока что, результаты так себе, а те, более ранние RNNs, с которыми авторы соревнуются, дают совсем плохие анимации ( ... )

anhinga_anhinga March 6 2021, 21:18:08 UTC

Спасибо, это полезно.

Вообще говоря, может быть, модели, включающие осцилляции по времени, будут более успешны в этом месте.

(Вообще, в последнее время тригонометрические activation functions используются вполне плодотворно, но я их больше вижу в статике; в динамике, наверное, надо добавить периодичности по времени явным образом.

Ну и дальше можно делать обычные модели, но с "синусоидальными мотивами", но, конечно, хочется сразу начать пробовать spiking neural nets - для ритмических движений они интутитивно кажутся привлекательными, а, вообще говоря, в этой области тоже очень приличный прогресс в последнее время.

В общем, эти два направления - периодические активационные функции и spiking neurons - кажутся перспективными в этом плане.)

am March 6 2021, 21:49:47 UTC

Например, https://arxiv.org/abs/1803.09574
и другие более ранние работы Wolfgang Maass
например, https://arxiv.org/abs/1611.03698

anhinga_anhinga March 6 2021, 23:00:56 UTC

Yes, here are tons of links I have in my notes for "spiking" in arxiv (but they all end in 2019, and not because the flow of papers decreased, but because I gave up on doing a good job monitoring the literature and keeping good notes of it ( ... )

am March 6 2021, 23:43:45 UTC

Thanks a lot for the links sharing! I copied a lot.

anhinga_anhinga March 6 2021, 23:10:29 UTC

(With spikes, people are hoping for energy-efficient computations... That's the extra motive, in addition to biorealism...

I think this motive started to be more pronounced in recent years, as some of the "breakthrough models" were eating orders and orders of magnitude more energy (at least, at the moment of breakthrough itself; the subsequent optimizations often passed almost unnoticed, so there was exaggerated impression that more and more energy was fundamentally necessary, and not just to become first-past-the-pole in research competition), and as "climate change concerns" became more pronounced in public opinion and in the minds of funding agencies.)

am March 6 2021, 23:45:51 UTC

As I remember, what was the interest of W.Maass et al ( ... )

anhinga_anhinga March 7 2021, 00:41:31 UTC

Interesting...

(Especially, about temporal sparseness as an accelerator of learning - and it's also very interesting who are the authors of that one...

It's might be that this is another of their work which have been ignored by the machine learning community, just like the 'ReLU induces the sparse structure' paper from the same group, Hahnloser, R.; Sarpeshkar, R.; Mahowald, M. A.; Douglas, R. J.; Seung, H. S. (2000). "Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit". Nature. 405 (6789): 947-951 https://www.nature.com/articles/35016072 was ignored till it was independently rediscovered 11 years later in Xavier Glorot, Antoine Bordes and Yoshua Bengio (2011) "Deep sparse rectifier neural networks", http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf after which ReLU quickly became the most popular activation function surpassing ( ... )

am March 7 2021, 01:28:12 UTC

Sorry for the re-editing of comments, the posting ( ... )

anhinga_anhinga March 7 2021, 01:53:54 UTC

Re-editing is fine - don't worry!

Yeah, one other motive from old Izhikevich was a two-parameter model of neuron (instead of a one-parameter leaky-integrate-and-fire), and that had a fast variable and a slow variable (I think the case was that ion channels for different ions had different temporal constants, and some of the biorealistic dynamics depended on that, so instead of the overall membrane potential as a variable, sodium potential and potassium potential were separate variables); I wonder if anyone did a machine learning model from that...

*******

I am also doing only a bit at the moment. I maintain this document:

https://www.cs.brandeis.edu/~bukatin/dmm-collaborative-research-agenda.pdf

and I hope to find people who want to collaborate on some of that (ну и чуть-чуть сам вожусь с некоторыми кусочками оттуда, последний год совсем в одиночку; до того была всякая совместная деятельность).

am March 30 2021, 21:23:26 UTC

It should be probably mentioned here, that (together
with the slow decay of eligibility traces) the model of
Izhikevich needs precise spike timing on millisecond scale.
Soltoggio formulated a similar model to work with varying
firing rates, not based on exact arrangements of spikes.
Soltoggio's mechanisms rely on the rarity of correlating
neural activity (temporal sparseness), which generates
rare eligibility traces. (By reducing the rate at which
traces are generated, traces can have longer decays).
https://direct.mit.edu/neco/article/25/4/940/7874/
Soltoggio argues that spikes coincidence used by
Izhikevich is not necessary, but it is just one of
possible mechanisms to detect correlated activity.
Robot: https://pub.uni-bielefeld.de/record/2547895

Back to all threads