INDEX
Explanations
the contraction "it's"
instances of the letter 's' in various contexts
New Auto-Interp
Negative Logits
Reef
-0.83
Lauder
-0.64
Benny
-0.64
oop
-0.62
GROUP
-0.58
POL
-0.57
Reporting
-0.57
geries
-0.56
Compliance
-0.56
Polk
-0.55
POSITIVE LOGITS
lightly
0.72
forth
0.68
agna
0.67
theless
0.66
udder
0.63
selves
0.60
self
0.59
grim
0.59
plete
0.59
spoiler
0.59
Activations Density 0.287%