INDEX
Explanations
the occurrence of verbs indicating the beginning of actions or processes
New Auto-Interp
Negative Logits
llib
-0.16
ulence
-0.15
oops
-0.15
abama
-0.15
/bin
-0.14
riet
-0.14
vit
-0.14
at
-0.14
hawk
-0.14
ought
-0.13
POSITIVE LOGITS
nings
0.21
urat
0.17
ãĤº
0.15
icont
0.15
/end
0.15
_unregister
0.14
itez
0.14
ãĥ³ãĥĩ
0.14
quet
0.14
FTA
0.14
Activations Density 0.055%