INDEX
Explanations
the letter 's' at the end of words
the word "is" and its forms in various contexts
New Auto-Interp
Negative Logits
override
-0.66
outweigh
-0.65
Prior
-0.64
alys
-0.63
Reef
-0.61
takedown
-0.60
evaluations
-0.60
scares
-0.59
Prior
-0.59
Hebdo
-0.58
POSITIVE LOGITS
outhern
0.87
pecially
0.83
been
0.82
forth
0.77
enegger
0.76
igi
0.75
inki
0.75
leeve
0.74
por
0.74
lightly
0.73
Activations Density 0.109%