INDEX
Explanations
contractions of the words "would" and "had"
New Auto-Interp
Negative Logits
PI
-0.67
aders
-0.66
Gutenberg
-0.63
Hels
-0.63
IDA
-0.63
士
-0.59
strings
-0.59
MODULE
-0.59
vich
-0.58
Compass
-0.58
POSITIVE LOGITS
been
1.07
gotten
1.05
gotta
1.01
gone
0.93
been
0.84
hran
0.80
Been
0.80
gladly
0.78
got
0.76
ridden
0.75
Activations Density 0.016%