INDEX
Negative Logits
pleasing
-0.08
Б
-0.08
-0.07
Brian
-0.07
popular
-0.07
technological
-0.07
manipulation
-0.07
Brian
-0.07
прод
-0.07
atractivo
-0.07
POSITIVE LOGITS
rypton
0.09
jusqu
0.09
Tests
0.09
్రీ
0.09
até
0.08
vigilance
0.08
alatt
0.08
rif
0.08
जांच
0.08
erros
0.08
Activations Density 0.008%