INDEX
Explanations
expressions of fear and risk related to personal safety
New Auto-Interp
Negative Logits
ehler
-0.16
ĮĴ
-0.15
culprit
-0.15
خش
-0.14
Įĵ
-0.14
Ậ
-0.14
guilty
-0.14
insi
-0.13
affen
-0.13
Regards
-0.13
POSITIVE LOGITS
being
0.40
being
0.34
Being
0.29
Being
0.29
becoming
0.29
scrutiny
0.28
被
0.27
detection
0.25
losing
0.24
attack
0.23
Activations Density 0.227%