INDEX
Negative Logits
1
0.71
standout
0.70
ר
0.70
0
0.70
loro
0.67
harrowing
0.63
2
0.63
al
0.62
dialogue
0.62
ی
0.62
POSITIVE LOGITS
猀
0.62
𝐜
0.61
დროს
0.61
Kräfte
0.61
బర్
0.60
áme
0.60
人不
0.58
ве
0.57
вано
0.57
悎
0.57
Activations Density 0.009%
1
standout
ר
0
loro
harrowing
2
al
dialogue
ی
猀
𝐜
დროს
Kräfte
బర్
áme
人不
ве
вано
悎