INDEX
Negative Logits
For
0.49
Control
0.49
Outcome
0.49
र
0.47
राम
0.47
Conversely
0.47
不过
0.46
ט
0.43
紓
0.43
א
0.42
POSITIVE LOGITS
scientist
0.51
beauties
0.51
chefs
0.50
explot
0.50
degrad
0.50
vulgar
0.49
microns
0.48
enzym
0.48
starch
0.48
extravag
0.47
Activations Density 0.010%