Кстати, в этой статье есть список ссылок на: longformer, reformer, adaptive attention span, compressive transformer, blockwise transformer, BigBird, linformer..
Спасибо! Интересный обзор "Efficient Transformers". Не встречалось ли вам статей o transformers в моделировании движений, танцев и т.п.? (Много статей о подобном применении GRU(/LSTM)+attention я уже встречал..)
Не помню, чтобы такое мне попадалось. (Но это ничего не значит, конечно, я ведь не пробовал искать на эту тему, и я уже совсем не справляюсь с потоком статей, так что, я вижу небольшую часть того, что происходит.)
:-) Помню только анимированную визуализацию из исследования "BERTology Meets Biology: Interpreting Attention in Protein Language Models", которая создавала в уме зрителя/читателя ассоцияцию с чем-то таким :-) Я так чувствую, что изящность этой анимации и была причиной того, что я это исследование тогда заметил, и до сих пор помню:
Вообще говоря, может быть, модели, включающие осцилляции по времени, будут более успешны в этом месте.
(Вообще, в последнее время тригонометрические activation functions используются вполне плодотворно, но я их больше вижу в статике; в динамике, наверное, надо добавить периодичности по времени явным образом.
Ну и дальше можно делать обычные модели, но с "синусоидальными мотивами", но, конечно, хочется сразу начать пробовать spiking neural nets - для ритмических движений они интутитивно кажутся привлекательными, а, вообще говоря, в этой области тоже очень приличный прогресс в последнее время.
В общем, эти два направления - периодические активационные функции и spiking neurons - кажутся перспективными в этом плане.)
Yes, here are tons of links I have in my notes for "spiking" in arxiv (but they all end in 2019, and not because the flow of papers decreased, but because I gave up on doing a good job monitoring the literature and keeping good notes of it
( ... )
(With spikes, people are hoping for energy-efficient computations... That's the extra motive, in addition to biorealism...
I think this motive started to be more pronounced in recent years, as some of the "breakthrough models" were eating orders and orders of magnitude more energy (at least, at the moment of breakthrough itself; the subsequent optimizations often passed almost unnoticed, so there was exaggerated impression that more and more energy was fundamentally necessary, and not just to become first-past-the-pole in research competition), and as "climate change concerns" became more pronounced in public opinion and in the minds of funding agencies.)
(Especially, about temporal sparseness as an accelerator of learning - and it's also very interesting who are the authors of that one...
It's might be that this is another of their work which have been ignored by the machine learning community, just like the 'ReLU induces the sparse structure' paper from the same group, Hahnloser, R.; Sarpeshkar, R.; Mahowald, M. A.; Douglas, R. J.; Seung, H. S. (2000). "Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit". Nature. 405 (6789): 947-951 https://www.nature.com/articles/35016072 was ignored till it was independently rediscovered 11 years later in Xavier Glorot, Antoine Bordes and Yoshua Bengio (2011) "Deep sparse rectifier neural networks", http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf after which ReLU quickly became the most popular activation function surpassing
( ... )
Yeah, one other motive from old Izhikevich was a two-parameter model of neuron (instead of a one-parameter leaky-integrate-and-fire), and that had a fast variable and a slow variable (I think the case was that ion channels for different ions had different temporal constants, and some of the biorealistic dynamics depended on that, so instead of the overall membrane potential as a variable, sodium potential and potassium potential were separate variables); I wonder if anyone did a machine learning model from that...
*******
I am also doing only a bit at the moment. I maintain this document:
and I hope to find people who want to collaborate on some of that (ну и чуть-чуть сам вожусь с некоторыми кусочками оттуда, последний год совсем в одиночку; до того была всякая совместная деятельность).
It should be probably mentioned here, that (together with the slow decay of eligibility traces) the model of Izhikevich needs precise spike timing on millisecond scale. Soltoggio formulated a similar model to work with varying firing rates, not based on exact arrangements of spikes. Soltoggio's mechanisms rely on the rarity of correlating neural activity (temporal sparseness), which generates rare eligibility traces. (By reducing the rate at which traces are generated, traces can have longer decays). https://direct.mit.edu/neco/article/25/4/940/7874/ Soltoggio argues that spikes coincidence used by Izhikevich is not necessary, but it is just one of possible mechanisms to detect correlated activity. Robot: https://pub.uni-bielefeld.de/record/2547895
Кстати, в этой статье есть список ссылок на:
longformer, reformer, adaptive attention span,
compressive transformer, blockwise transformer,
BigBird, linformer..
Reply
Две статьи, которые мне показались в своё время особенно полезны:
обзор "Efficient Transformers: A Survey" https://arxiv.org/abs/2009.06732 (Google Research)
и
"Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention" https://arxiv.org/abs/2006.16236 (Switzerland + U.Washington)
Reply
Не встречалось ли вам статей o transformers в моделировании движений, танцев и т.п.?
(Много статей о подобном применении GRU(/LSTM)+attention я уже встречал..)
Reply
:-) Помню только анимированную визуализацию из исследования "BERTology Meets Biology: Interpreting Attention in Protein Language Models", которая создавала в уме зрителя/читателя ассоцияцию с чем-то таким :-) Я так чувствую, что изящность этой анимации и была причиной того, что я это исследование тогда заметил, и до сих пор помню:
https://twitter.com/RichardSocher/status/1278058096481333253
Reply
modeling movement dance with transformers
сразу показывает всякое:
"Learning to Generate Diverse Dance Motions with Transformer" https://arxiv.org/abs/2008.08171
etc...
Reply
Reply
Вообще говоря, может быть, модели, включающие осцилляции по времени, будут более успешны в этом месте.
(Вообще, в последнее время тригонометрические activation functions используются вполне плодотворно, но я их больше вижу в статике; в динамике, наверное, надо добавить периодичности по времени явным образом.
Ну и дальше можно делать обычные модели, но с "синусоидальными мотивами", но, конечно, хочется сразу начать пробовать spiking neural nets - для ритмических движений они интутитивно кажутся привлекательными, а, вообще говоря, в этой области тоже очень приличный прогресс в последнее время.
В общем, эти два направления - периодические активационные функции и spiking neurons - кажутся перспективными в этом плане.)
Reply
и другие более ранние работы Wolfgang Maass
например, https://arxiv.org/abs/1611.03698
Reply
Reply
Reply
I think this motive started to be more pronounced in recent years, as some of the "breakthrough models" were eating orders and orders of magnitude more energy (at least, at the moment of breakthrough itself; the subsequent optimizations often passed almost unnoticed, so there was exaggerated impression that more and more energy was fundamentally necessary, and not just to become first-past-the-pole in research competition), and as "climate change concerns" became more pronounced in public opinion and in the minds of funding agencies.)
Reply
Reply
(Especially, about temporal sparseness as an accelerator of learning - and it's also very interesting who are the authors of that one...
It's might be that this is another of their work which have been ignored by the machine learning community, just like the 'ReLU induces the sparse structure' paper from the same group, Hahnloser, R.; Sarpeshkar, R.; Mahowald, M. A.; Douglas, R. J.; Seung, H. S. (2000). "Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit". Nature. 405 (6789): 947-951 https://www.nature.com/articles/35016072 was ignored till it was independently rediscovered 11 years later in Xavier Glorot, Antoine Bordes and Yoshua Bengio (2011) "Deep sparse rectifier neural networks", http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf after which ReLU quickly became the most popular activation function surpassing ( ... )
Reply
Reply
Yeah, one other motive from old Izhikevich was a two-parameter model of neuron (instead of a one-parameter leaky-integrate-and-fire), and that had a fast variable and a slow variable (I think the case was that ion channels for different ions had different temporal constants, and some of the biorealistic dynamics depended on that, so instead of the overall membrane potential as a variable, sodium potential and potassium potential were separate variables); I wonder if anyone did a machine learning model from that...
*******
I am also doing only a bit at the moment. I maintain this document:
https://www.cs.brandeis.edu/~bukatin/dmm-collaborative-research-agenda.pdf
and I hope to find people who want to collaborate on some of that (ну и чуть-чуть сам вожусь с некоторыми кусочками оттуда, последний год совсем в одиночку; до того была всякая совместная деятельность).
Reply
with the slow decay of eligibility traces) the model of
Izhikevich needs precise spike timing on millisecond scale.
Soltoggio formulated a similar model to work with varying
firing rates, not based on exact arrangements of spikes.
Soltoggio's mechanisms rely on the rarity of correlating
neural activity (temporal sparseness), which generates
rare eligibility traces. (By reducing the rate at which
traces are generated, traces can have longer decays).
https://direct.mit.edu/neco/article/25/4/940/7874/
Soltoggio argues that spikes coincidence used by
Izhikevich is not necessary, but it is just one of
possible mechanisms to detect correlated activity.
Robot: https://pub.uni-bielefeld.de/record/2547895
Reply
Leave a comment