INDEX
Negative Logits
privacy
0.93
phrases
0.92
always
0.89
politely
0.85
requests
0.84
anyakan
0.82
queries
0.81
请求
0.81
dialogue
0.79
polite
0.79
POSITIVE LOGITS
parcialmente
1.03
dennoch
0.90
debacle
0.89
Partial
0.86
succumb
0.84
succumbed
0.84
компен
0.83
partiellement
0.81
توڑ
0.79
enraged
0.79
Activations Density 0.934%