INDEX
Explanations
references to personal agency and self-empowerment
New Auto-Interp
Negative Logits
ellt
-0.17
idlo
-0.16
annis
-0.15
оваÑĢ
-0.15
ynamo
-0.15
ebo
-0.15
dük
-0.15
kok
-0.14
annels
-0.14
.Elapsed
-0.14
POSITIVE LOGITS
soft
0.19
Soft
0.18
soften
0.16
Mac
0.16
Hard
0.16
áz
0.15
William
0.15
Andres
0.15
Soft
0.15
hard
0.15
Activations Density 0.029%