INDEX
Explanations
references to specific political figures, particularly former presidents
references to former political leaders and their titles
New Auto-Interp
Negative Logits
yss
-0.67
urat
-0.65
itivity
-0.64
igs
-0.63
beh
-0.63
ickle
-0.62
ointment
-0.61
fingert
-0.61
ciation
-0.61
illy
-0.61
POSITIVE LOGITS
Yugoslavia
0.79
Desmond
0.74
ãĤ¶
0.73
turned
0.69
Olympia
0.65
Biden
0.63
oslov
0.63
Lyndon
0.62
bilt
0.62
ģĸ
0.61
Activations Density 0.154%