INDEX
Negative Logits
...,
-0.08
DH
-0.08
tri
-0.08
obnox
-0.07
eccentric
-0.07
richtige
-0.07
trig
-0.07
stupid
-0.07
nhỏ
-0.07
libs
-0.07
POSITIVE LOGITS
Enfin
0.08
(t
0.08
(layer
0.07
Watkins
0.07
taking
0.07
(x
0.07
masyarakat
0.07
had
0.07
Lastly
0.07
ukk
0.07
Activations Density 0.011%