INDEX
Explanations
playful or whimsical language and imagery
New Auto-Interp
Negative Logits
ormsg
-0.19
raci
-0.17
agate
-0.17
lients
-0.16
hend
-0.15
.bd
-0.15
keh
-0.14
Ãły
-0.14
born
-0.14
irq
-0.14
POSITIVE LOGITS
Bab
0.16
-pop
0.15
pop
0.15
bois
0.14
kins
0.14
à¥īप
0.14
hops
0.14
ories
0.14
pop
0.14
hal
0.14
Activations Density 0.063%