INDEX
Explanations
references to specific locations or names, especially related to current events or politics
instances of the word "the."
New Auto-Interp
Negative Logits
aba
-0.77
because
-0.73
thood
-0.72
elaide
-0.71
eno
-0.71
Ò
-0.70
acy
-0.68
!!!!
-0.67
leground
-0.67
ional
-0.65
POSITIVE LOGITS
slightest
1.04
simplest
1.04
biggest
1.03
vast
1.01
odore
1.01
latter
1.00
resa
0.99
majority
0.99
entire
0.98
oldest
0.98
Activations Density 0.274%