INDEX
Negative Logits
Ⅵ
0.36
robustness
0.32
狨
0.32
DataDiv
0.32
devaluation
0.32
النسبيه
0.31
श्रमिकों
0.30
📉
0.30
Ⅷ
0.30
必须
0.29
POSITIVE LOGITS
<start_of_image>
0.39
h
0.32
pr
0.32
f
0.31
hés
0.30
cB
0.30
ob
0.30
They
0.29
Notably
0.29
So
0.29
Activations Density 0.009%