INDEX
Explanations
words with special characters, such as ø
occurrences of the character "ø"
New Auto-Interp
Negative Logits
DonaldTrump
-0.75
crush
-0.68
orial
-0.65
apsed
-0.65
itious
-0.63
reader
-0.63
bread
-0.61
UCT
-0.61
iph
-0.61
graded
-0.60
POSITIVE LOGITS
ø
1.24
Andersen
0.99
hett
0.87
Ã¥
0.87
¶
0.86
ĨĴ
0.85
ð
0.84
æ
0.83
ö
0.81
borg
0.81
Activations Density 0.005%