INDEX
Explanations
negative descriptors or criticisms
instances of the word "silly."
New Auto-Interp
Negative Logits
yer
-0.95
Reviewed
-0.92
rigan
-0.84
ept
-0.81
ainer
-0.81
ioch
-0.81
ainers
-0.79
amen
-0.78
rien
-0.76
lain
-0.76
POSITIVE LOGITS
silly
1.06
nonsense
0.91
aside
0.87
Ples
0.84
prank
0.81
ness
0.81
childish
0.79
Haram
0.78
Pry
0.76
ishly
0.76
Activations Density 0.018%