INDEX
Explanations
is adjective
is significant, beyond, lower
New Auto-Interp
Negative Logits
;
0.68
(
0.63
*
0.61
in
0.61
在
0.61
?
0.59
>
0.51
[
0.49
#
0.49
(.*
0.48
POSITIVE LOGITS
d
0.65
dır
0.54
is
0.52
dı
0.52
larında
0.51
larını
0.50
u
0.50
いた
0.50
もの
0.50
dj
0.50
Activations Density 1.119%