INDEX
Explanations
phrases related to political statements or claims
New Auto-Interp
Negative Logits
ernel
-0.15
linger
-0.15
vore
-0.14
κη
-0.14
Zeit
-0.14
ิà¸Ķ
-0.14
гÑĢÑĥн
-0.14
exile
-0.14
zeit
-0.14
Shirley
-0.13
POSITIVE LOGITS
Fake
0.20
Radical
0.17
Rig
0.16
Fake
0.16
476
0.16
è«ĩ
0.15
cher
0.14
ISTA
0.14
Rip
0.14
okia
0.14
Activations Density 0.049%