INDEX
Negative Logits
nama
-0.07
(rule
-0.07
arem
-0.07
_energy
-0.06
ात
-0.06
.sam
-0.06
„ظ
-0.06
ToStr
-0.06
stration
-0.06
MM
-0.06
POSITIVE LOGITS
Copying
0.07
Luckily
0.07
Prompt
0.06
scratched
0.06
ucked
0.06
both
0.06
TERN
0.06
promptly
0.06
narratives
0.06
hızlı
0.06
Activations Density 0.006%