INDEX
Negative Logits
Safety
0.57
es
0.56
Integrity
0.53
integrity
0.52
efficacy
0.52
is
0.52
yl
0.52
Statistical
0.51
Advocate
0.51
um
0.50
POSITIVE LOGITS
aceler
0.78
روند
0.75
hastened
0.73
hasten
0.70
accél
0.70
уско
0.69
acceler
0.65
accelerate
0.64
accelerated
0.64
ускори
0.63
Activations Density 0.128%