INDEX
Explanations
references to specific foreign names and locations
special characters or symbols in the text, particularly those resembling letters with diacritics
New Auto-Interp
Negative Logits
DonaldTrump
-0.72
bread
-0.69
idious
-0.67
itious
-0.67
bearer
-0.65
crush
-0.65
utterstock
-0.65
icity
-0.65
iple
-0.65
enegger
-0.64
POSITIVE LOGITS
ø
1.06
¶
0.95
ĺ
0.95
ħ
0.89
¸
0.89
¬
0.87
Å¡
0.86
hett
0.86
ĨĴ
0.86
rn
0.86
Activations Density 0.007%