INDEX
Explanations
the word "the" with a high activation
instances of the word "the."
New Auto-Interp
Negative Logits
anni
-0.80
lich
-0.78
-+-+
-0.78
ambo
-0.76
alde
-0.75
cade
-0.72
tu
-0.72
den
-0.71
alloc
-0.69
ploy
-0.68
POSITIVE LOGITS
plunge
1.42
brunt
1.31
initiative
1.16
bait
1.16
reins
1.16
opportunity
1.14
helm
1.13
liberty
1.13
precaution
1.05
leap
1.03
Activations Density 0.044%