INDEX
Explanations
words related to emphasizing importance or significance
the word "the" in various contexts throughout the text
New Auto-Interp
Negative Logits
craft
-0.82
claw
-0.74
each
-0.68
fw
-0.68
besides
-0.67
leeve
-0.67
abuse
-0.67
adoes
-0.67
ago
-0.65
rade
-0.64
POSITIVE LOGITS
easiest
1.27
simplest
1.22
same
1.18
strongest
1.17
greatest
1.15
biggest
1.13
heaviest
1.12
largest
1.11
smallest
1.10
hardest
1.09
Activations Density 0.303%