INDEX
Explanations
temporal indicators related to actions and events
New Auto-Interp
Negative Logits
-0.19
cco
-0.14
clud
-0.13
;
-0.13
iry
-0.13
if
-0.13
â
-0.13
Evel
-0.12
iki
-0.12
æĥij
-0.12
POSITIVE LOGITS
же
0.21
istrovstvÃŃ
0.17
cela
0.16
maal
0.16
ìĿ´ëĬĶ
0.15
his
0.15
váºŃy
0.15
ìĿ´ë¥¼
0.14
entai
0.14
these
0.14
Activations Density 0.894%