INDEX
Explanations
contractions of "is" with another word following it
New Auto-Interp
Negative Logits
roit
-0.78
eal
-0.76
ares
-0.71
soever
-0.64
icut
-0.61
wastes
-0.61
agos
-0.60
approves
-0.59
implements
-0.59
ally
-0.58
POSITIVE LOGITS
gotta
1.08
been
1.04
plenty
1.02
gonna
0.96
always
0.93
no
0.84
nothing
0.83
definitely
0.81
still
0.80
lots
0.80
Activations Density 0.048%