INDEX
Explanations
the word "the."
instances of the word "watch."
New Auto-Interp
Negative Logits
thood
-0.75
ucl
-0.74
uti
-0.71
ccoli
-0.69
recy
-0.69
eno
-0.67
iev
-0.66
manship
-0.65
trl
-0.65
aspberry
-0.65
POSITIVE LOGITS
same
1.28
entirety
1.15
entire
1.15
slightest
1.14
latest
1.11
smallest
1.10
whole
1.07
ses
0.99
aftermath
0.98
vast
0.98
Activations Density 0.402%