INDEX
Explanations
expressions of willingness and openness to help or engage with others
New Auto-Interp
Negative Logits
ercul
-0.18
.scalablytyped
-0.17
omaly
-0.16
atten
-0.16
atters
-0.15
erson
-0.15
cctor
-0.15
succesfully
-0.14
uart
-0.14
ÐľÐŀ
-0.14
POSITIVE LOGITS
sacrifice
0.23
accepting
0.20
challenge
0.20
accept
0.19
accepts
0.19
sacr
0.19
sacrifices
0.19
æİ¥åıĹ
0.18
Sacr
0.18
Challenge
0.18
Activations Density 0.093%