INDEX
Explanations
words related to actions or tasks that are not being done or are forbidden
instances of the word "does" and its variations
New Auto-Interp
Negative Logits
Dise
-0.74
)=(
-0.73
Tai
-0.67
ulative
-0.66
case
-0.65
palms
-0.65
Handling
-0.65
Methods
-0.64
Reviewer
-0.64
Replacement
-0.63
POSITIVE LOGITS
ppel
1.09
omsday
0.92
berman
0.91
ozy
0.90
herty
0.90
not
0.88
indeed
0.88
pez
0.88
nothing
0.85
likewise
0.84
Activations Density 0.134%