INDEX
Explanations
expressions of empathy and pity towards individuals or groups
New Auto-Interp
Negative Logits
yne
-0.16
طار
-0.16
raith
-0.15
abh
-0.15
desi
-0.14
аниÑĨ
-0.14
.AutoScaleMode
-0.14
edik
-0.14
eon
-0.13
arro
-0.13
POSITIVE LOGITS
691
0.15
kontakte
0.15
nv
0.14
Rolls
0.14
udi
0.14
ooter
0.14
Caval
0.14
atoi
0.14
renal
0.14
ROLL
0.14
Activations Density 0.051%