INDEX
Explanations
phrases where the words "the" and another word are close to each other
repetitive use of the word "the."
New Auto-Interp
Negative Logits
tackle
-0.89
anon
-0.71
thood
-0.71
=#
-0.69
adays
-0.68
quished
-0.68
iversal
-0.67
again
-0.67
still
-0.66
CLA
-0.66
POSITIVE LOGITS
slightest
1.31
simplest
1.29
smallest
1.16
finest
1.09
cheapest
1.06
basics
1.04
hars
1.00
richest
0.98
easiest
0.98
wealthiest
0.98
Activations Density 0.152%