INDEX
Explanations
words related to political figures and events
New Auto-Interp
Negative Logits
IUM
-0.80
lished
-0.58
pressures
-0.57
¶ħ
-0.57
popcorn
-0.56
hazards
-0.55
surface
-0.55
calibrated
-0.55
Downs
-0.54
Ĥİ
-0.54
POSITIVE LOGITS
Allah
0.82
wife
0.80
hood
0.78
daughter
0.77
abet
0.73
brother
0.73
backer
0.71
chief
0.71
aiden
0.70
son
0.70
Activations Density 0.069%