INDEX
Negative Logits
(
0.76
يد
0.70
ER
0.66
UM
0.66
AT
0.65
EX
0.65
premiere
0.61
েন
0.61
ньше
0.61
Aqui
0.61
POSITIVE LOGITS
tật
0.73
o
0.73
૧
0.73
೧
0.71
definitions
0.70
饰
0.70
factors
0.70
рка
0.67
제작
0.67
دھو
0.66
Activations Density 0.004%
(
يد
ER
UM
AT
EX
premiere
েন
ньше
Aqui
tật
o
૧
೧
definitions
饰
factors
рка
제작
دھو