INDEX
Explanations
instances of the term "smug."
New Auto-Interp
Negative Logits
olet
-0.16
oq
-0.15
quine
-0.15
Forgot
-0.15
o
-0.15
agate
-0.14
nehmen
-0.14
lict
-0.14
ivated
-0.14
igenous
-0.14
POSITIVE LOGITS
older
0.26
other
0.24
idge
0.22
arts
0.22
elly
0.21
ears
0.21
ug
0.21
ould
0.21
sm
0.21
acking
0.20
Activations Density 0.006%