INDEX
Explanations
phrases emphasizing empowerment and personal responsibility
New Auto-Interp
Negative Logits
owitz
-0.19
odash
-0.17
.unpack
-0.16
tank
-0.15
ihn
-0.15
806
-0.14
<location
-0.14
ekim
-0.14
Coord
-0.14
strand
-0.14
POSITIVE LOGITS
hands
0.38
manos
0.30
Hands
0.30
safe
0.25
custody
0.24
Hands
0.24
capable
0.24
into
0.24
nelle
0.23
hands
0.23
Activations Density 0.056%