INDEX
Explanations
phrases indicating the presence or occurrence of events or changes over time
New Auto-Interp
Negative Logits
udeau
-0.17
angel
-0.16
utenberg
-0.15
mtree
-0.15
alaxy
-0.15
ZR
-0.14
adlo
-0.14
leared
-0.14
nothing
-0.14
orp
-0.14
POSITIVE LOGITS
been
0.30
been
0.26
Been
0.22
BEEN
0.21
Been
0.21
byl
0.19
být
0.19
was
0.19
byla
0.18
بÙĪØ¯Ùĩ
0.17
Activations Density 0.016%