INDEX
Explanations
phrases related to official reports or statements
the word "the."
New Auto-Interp
Negative Logits
Ò
-0.70
aba
-0.68
thood
-0.67
leeve
-0.66
because
-0.66
ceive
-0.66
eno
-0.65
1200
-0.65
bg
-0.65
beforehand
-0.63
POSITIVE LOGITS
biggest
1.11
resa
1.11
oret
1.07
oldest
1.06
odore
1.06
simplest
1.05
largest
1.03
latter
1.01
vast
1.00
slightest
1.00
Activations Density 0.438%