INDEX
Negative Logits
rotch
-0.26
tif
-0.26
correctly
-0.26
brick
-0.26
braz
-0.24
ickness
-0.24
rück
-0.24
inci
-0.24
оÑģÑĤи
-0.24
omnia
-0.23
POSITIVE LOGITS
elor
0.26
Hä
0.26
swapped
0.25
ale
0.24
el
0.24
orama
0.24
è®®
0.23
vill
0.23
ature
0.23
imer
0.23
Activations Density 0.020%