INDEX
Explanations
emotional language and expressions related to praise or criticism
New Auto-Interp
Negative Logits
è£ıè
-0.74
uyomi
-0.69
ACP
-0.67
thodox
-0.66
ãĥ¼ãĥĨ
-0.65
é¾įå¥ij士
-0.65
Struct
-0.62
Eastern
-0.60
PF
-0.59
gd
-0.59
POSITIVE LOGITS
yourselves
1.59
yourself
1.20
Tube
0.86
your
0.84
your
0.81
YOUR
0.75
Yourself
0.75
majesty
0.73
cunt
0.73
sir
0.72
Activations Density 0.284%