INDEX
Explanations
expressions of negativity or lack, often related to an absence or deficiency
New Auto-Interp
Negative Logits
kla
-0.15
ehler
-0.15
buz
-0.14
Unnamed
-0.14
ç½
-0.14
rotch
-0.14
PLETED
-0.13
kaar
-0.13
RG
-0.13
ذر
-0.13
POSITIVE LOGITS
harmful
0.18
sil
0.16
noise
0.16
adden
0.15
fear
0.15
769
0.15
Fear
0.14
clutter
0.14
vulnerability
0.14
apore
0.14
Activations Density 0.261%