INDEX
Negative Logits
躹
0.46
fü
0.45
Conte
0.44
подобные
0.44
Indirect
0.44
หลาย
0.43
Phạm
0.43
подобных
0.43
phần
0.43
геометри
0.42
POSITIVE LOGITS
correctly
0.65
randomly
0.61
observed
0.61
chosen
0.60
occurred
0.59
chosen
0.57
flips
0.55
selected
0.55
observing
0.55
observed
0.55
Activations Density 0.062%