INDEX
Explanations
describing expected test outcomes
New Auto-Interp
Negative Logits
spannende
-1.70
élimin
-1.52
besonder
-1.44
véritable
-1.43
);
-1.41
spektak
-1.34
essura
-1.31
];
-1.30
gezogen
-1.30
opravdu
-1.29
POSITIVE LOGITS
There
1.73
now
1.63
not
1.54
!
1.52
prata
1.51
Other
1.47
all
1.45
Some
1.45
........
1.45
this
1.45
Activations Density 0.004%