INDEX
Explanations
words related to names and locations
occurrences of specific names or characters, particularly related to a notable figure or term
New Auto-Interp
Negative Logits
DonaldTrump
-0.79
bread
-0.64
idious
-0.63
ulators
-0.62
suppress
-0.62
utterstock
-0.62
orial
-0.62
ifier
-0.62
hander
-0.62
itious
-0.61
POSITIVE LOGITS
ø
1.19
Ã¥
0.95
¶
0.88
Andersen
0.87
hett
0.87
ĨĴ
0.84
ĺ
0.84
æ
0.83
borg
0.83
Å¡
0.81
Activations Density 0.008%