INDEX
Explanations
high-frequency terms or articles in the text
New Auto-Interp
Negative Logits
thood
-0.83
perse
-0.78
����
-0.76
ashington
-0.74
leground
-0.73
Scotland
-0.71
minent
-0.70
angering
-0.70
uclear
-0.69
20439
-0.68
POSITIVE LOGITS
downside
1.30
oret
1.27
biggest
1.16
drawback
1.11
sheer
1.08
easiest
1.06
simplest
1.05
coolest
1.00
latter
0.99
cheapest
0.98
Activations Density 0.206%