INDEX
Explanations
discussions related to societal issues and regulations affecting citizens and various social groups
New Auto-Interp
Negative Logits
izu
-0.15
byss
-0.14
ulumi
-0.14
ób
-0.13
subur
-0.13
voksen
-0.13
oyer
-0.13
edo
-0.13
mlink
-0.13
ulan
-0.13
POSITIVE LOGITS
alike
0.25
whom
0.21
who
0.19
/operators
0.19
们
0.18
hips
0.17
/client
0.17
folk
0.16
innen
0.15
’
0.15
Activations Density 0.490%