INDEX
Explanations
direct quotes from individuals
New Auto-Interp
Negative Logits
s
-0.69
Ùĩ
-0.32
sburg
-0.31
sian
-0.29
ska
-0.28
a
-0.26
ÏĤ
-0.25
sand
-0.24
न
-0.23
sik
-0.23
POSITIVE LOGITS
atre
0.16
wahl
0.15
ertest
0.15
odore
0.15
bsites
0.14
geber
0.14
gether
0.14
بÙĪØ§Ø¨Ø©
0.14
.Abstractions
0.14
دÙĪØ§Ø¬
0.14
Activations Density 0.086%