INDEX
Explanations
statements involving verbal claims or communications about events
New Auto-Interp
Negative Logits
habit
-0.17
steen
-0.15
ICC
-0.15
assin
-0.14
oref
-0.14
latin
-0.14
chein
-0.14
batis
-0.14
ADE
-0.14
heid
-0.14
POSITIVE LOGITS
iant
0.14
ÏĢλ
0.14
UNCH
0.13
wn
0.13
ONTAL
0.13
ç̬
0.13
konuÅŁtu
0.13
æķ´
0.13
leur
0.13
hai
0.13
Activations Density 0.104%