INDEX
Negative Logits
Perspective
-0.08
$l
-0.07
violated
-0.06
violates
-0.06
الك
-0.06
tember
-0.06
backing
-0.06
differently
-0.06
destek
-0.06
.Man
-0.06
POSITIVE LOGITS
IOD
0.07
Yun
0.06
737
0.06
xD
0.06
uchos
0.06
resil
0.06
vysok
0.06
رى
0.06
ुट
0.06
+
0.06
Activations Density 0.190%