INDEX
Explanations
references to unknown or ambiguous situations or events
New Auto-Interp
Negative Logits
fait
-0.17
mps
-0.16
("'"-0.14
ramer
-0.14
onde
-0.14
ghan
-0.14
oyal
-0.14
ÑģоÑĤ
-0.13
Hills
-0.13
849
-0.13
POSITIVE LOGITS
wrong
0.27
wrong
0.25
Wrong
0.23
Wrong
0.23
WRONG
0.19
fish
0.18
_wrong
0.16
missing
0.16
bjerg
0.16
Missing
0.16
Activations Density 0.044%