INDEX
Explanations
words related to vulnerability or lack of control
expressions of helplessness and powerlessness
New Auto-Interp
Negative Logits
ickr
-0.81
issue
-0.74
edia
-0.74
anners
-0.64
aters
-0.64
WAYS
-0.63
cius
-0.63
rosso
-0.62
dule
-0.62
ramid
-0.62
POSITIVE LOGITS
helpless
1.38
nesses
1.10
ness
1.08
NESS
1.01
ingly
0.83
strugg
0.83
powerless
0.83
redes
0.80
hopeless
0.79
TPPStreamerBot
0.76
Activations Density 0.016%