INDEX
Explanations
phrases related to being punished or reprimanded
references to slapping or related actions and their implications
New Auto-Interp
Negative Logits
tis
-0.76
icult
-0.75
ernand
-0.71
ests
-0.68
EMBER
-0.67
oÄŁ
-0.61
éĸ
-0.61
Seventh
-0.59
mpeg
-0.59
Grande
-0.59
POSITIVE LOGITS
dash
1.20
dab
1.07
creen
1.05
stick
0.94
down
0.75
lihood
0.75
brush
0.73
bang
0.73
ction
0.73
metry
0.72
Activations Density 0.077%