INDEX
Negative Logits
भ्रष्टाचार
0.77
sweetness
0.75
encuentros
0.73
robustness
0.71
slander
0.71
honesty
0.71
generosity
0.69
eloquence
0.69
SecurityCenter
0.68
sayısı
0.68
POSITIVE LOGITS
commands
1.44
Commands
1.43
command
1.32
Commands
1.29
Command
1.29
명령
1.25
commands
1.25
mandate
1.20
コマンド
1.19
Command
1.18
Activations Density 0.010%