INDEX
Explanations
numerical values or quantities in the text
New Auto-Interp
Negative Logits
Administración
-0.45
,
-0.43
Ausnahme
-0.42
Administração
-0.41
.
-0.40
The
-0.40
following
-0.39
Außerdem
-0.38
Dabei
-0.37
Allerdings
-0.37
POSITIVE LOGITS
<unused79>
0.72
<unused16>
0.72
<unused68>
0.71
<unused3>
0.71
<unused42>
0.71
<unused41>
0.71
<unused43>
0.71
<unused28>
0.71
<unused8>
0.71
<unused17>
0.71
Activations Density 0.133%