INDEX
Explanations
starting a comprehensive overview
New Auto-Interp
Negative Logits
l
1.41
i
1.16
]$.
1.13
+}$
1.05
또한
0.95
h
0.92
ર
0.92
TP
0.90
>
0.89
ف
0.89
POSITIVE LOGITS
an
1.23
is
1.19
at
1.18
comes
1.16
a
1.09
chez
1.05
kommt
1.05
from
1.05
sobr
1.04
convuls
1.03
Activations Density 0.555%