INDEX
Explanations
phrases encouraging personal evaluation or decision-making
New Auto-Interp
Negative Logits
stÃŃ
-0.15
reek
-0.15
åį
-0.14
enu
-0.14
elight
-0.14
andom
-0.14
Greg
-0.14
miner
-0.14
.rd
-0.14
ÏĢοÏį
-0.14
POSITIVE LOGITS
yourself
0.35
yourselves
0.28
themselves
0.28
Yourself
0.28
herself
0.25
èĩªå·±
0.23
ourselves
0.23
Ñģебе
0.23
sobie
0.22
à¹Ģà¸Ńà¸ĩ
0.22
Activations Density 0.054%