INDEX
Explanations
references to individuals involved in controversial political or social contexts
New Auto-Interp
Negative Logits
deniz
-0.17
interact
-0.17
interacting
-0.16
vs
-0.15
versus
-0.15
allied
-0.14
interacts
-0.14
junto
-0.14
Allied
-0.14
ATALOG
-0.14
POSITIVE LOGITS
whom
0.30
871
0.16
_tF
0.15
whose
0.15
mutual
0.15
who
0.15
quien
0.14
tut
0.14
uele
0.14
recip
0.14
Activations Density 0.364%