INDEX
Explanations
references to the Antifa movement
New Auto-Interp
Negative Logits
ĸļ
-0.80
mercial
-0.73
borne
-0.70
mong
-0.70
morph
-0.69
yrinth
-0.69
Surviv
-0.69
marked
-0.66
Faust
-0.65
itaire
-0.64
POSITIVE LOGITS
ignt
0.93
ifa
0.87
ÄŁ
0.87
qa
0.79
irm
0.75
ullah
0.74
q
0.74
roud
0.70
ÅŁ
0.69
IELD
0.69
Activations Density 0.003%