INDEX
Explanations
references to specific groups or entities in a political or social context
New Auto-Interp
Negative Logits
become
-0.20
becomes
-0.16
isko
-0.16
æĪIJ为
-0.15
éĤ¦
-0.15
undergo
-0.15
ESA
-0.15
weren
-0.15
bec
-0.15
assis
-0.15
POSITIVE LOGITS
426
0.18
Trails
0.17
trails
0.16
æŃ£åľ¨
0.16
Äijang
0.16
üzel
0.15
faces
0.15
está
0.14
fen
0.14
possesses
0.14
Activations Density 0.423%