INDEX
Explanations
references to individuals and their affiliations or titles
punctuation and structural elements
New Auto-Interp
Negative Logits
niſſe
-0.86
ſſung
-0.84
RegressionTest
-0.83
imagui
-0.82
ſind
-0.82
abestanden
-0.81
fashiola
-0.79
disambiguazione
-0.78
<unused20>
-0.77
[@BOS@]
-0.77
POSITIVE LOGITS
the
0.61
the
0.60
our
0.58
selaku
0.46
our
0.36
who
0.35
your
0.32
their
0.32
min
0.30
ti
0.30
Activations Density 0.088%