INDEX
Negative Logits
прият
0.54
고려
0.48
பிர
0.46
при
0.46
东南
0.46
හොඳ
0.46
่
0.45
तमिल
0.45
colorChoice
0.45
吴
0.44
POSITIVE LOGITS
0.57
Marxism
0.50
'
0.48
PHYSICS
0.46
clicked
0.45
manifold
0.44
dizziness
0.44
medicine
0.44
glaciers
0.44
water
0.43
Activations Density 0.002%