INDEX
Negative Logits
мою
-0.83
freezes
-0.79
recupero
-0.75
زوج
-0.75
正直
-0.73
Sympathi
-0.71
ดง
-0.70
vĩnh
-0.69
alluminio
-0.69
trainable
-0.68
POSITIVE LOGITS
safety
2.61
security
2.44
sécurité
2.16
Security
2.11
Safety
2.08
Safety
1.97
safety
1.96
security
1.94
seguridad
1.92
SAFETY
1.84
Activations Density 0.011%