INDEX
Explanations
terms related to destruction or negative impact on individuals or groups
New Auto-Interp
Negative Logits
ingo
-0.15
795
-0.14
voks
-0.14
åºŁ
-0.13
ÙĪØ§Ø¡
-0.13
ÑĸÑĩ
-0.13
aturity
-0.13
Fork
-0.13
verbosity
-0.13
fullest
-0.13
POSITIVE LOGITS
fragile
0.32
hopes
0.29
chances
0.28
integrity
0.26
attempts
0.26
delicate
0.26
reput
0.25
carefully
0.24
ability
0.23
reputation
0.23
Activations Density 0.402%