INDEX
Negative Logits
undesirable
0.36
heterogeneous
0.34
erroneous
0.34
inaccurate
0.33
0.33
an
0.33
species
0.32
object
0.32
concrete
0.31
incorrect
0.31
POSITIVE LOGITS
aveva
0.37
hadde
0.36
ayaa
0.36
who
0.35
ėjo
0.34
miał
0.34
നേതൃ
0.33
had
0.33
坐在
0.33
udeau
0.33
Activations Density 0.030%