INDEX
Explanations
proper nouns with additional symbols or characters
special characters or non-standard symbols in the text
New Auto-Interp
Negative Logits
crush
-0.65
INT
-0.61
place
-0.61
IPS
-0.61
ulate
-0.60
gist
-0.59
DonaldTrump
-0.59
iph
-0.59
Cobra
-0.59
itutional
-0.58
POSITIVE LOGITS
ø
1.34
Andersen
0.98
Ã¥
0.91
hett
0.89
ĨĴ
0.87
ð
0.86
æ
0.85
borg
0.85
Bok
0.83
ö
0.83
Activations Density 0.004%