INDEX
Explanations
phrases expressing absurdity or ridiculousness in various contexts
New Auto-Interp
Negative Logits
Manip
-0.38
Incl
-0.35
neutral
-0.34
relu
-0.33
manip
-0.33
noDo
-0.33
neutral
-0.32
insuffisamment
-0.32
keenly
-0.32
manip
-0.32
POSITIVE LOGITS
absurdity
0.94
absurd
0.88
ridiculous
0.88
absur
0.84
crazy
0.84
posterous
0.84
bizarre
0.81
locura
0.81
craz
0.79
madness
0.78
Activations Density 0.471%