INDEX
Explanations
language related to psychological distress or damage
New Auto-Interp
Negative Logits
esiz
-0.16
ikut
-0.15
Sole
-0.14
iffe
-0.14
sole
-0.14
æľ«
-0.14
edin
-0.13
anela
-0.13
éħ¸
-0.13
иÑģлов
-0.13
POSITIVE LOGITS
overall
0.35
overall
0.30
Overall
0.28
certain
0.28
general
0.27
Overall
0.26
general
0.25
Certain
0.23
Certain
0.21
basic
0.21
Activations Density 0.005%