INDEX
    Explanations

    references to individuals and their affiliations or titles

    punctuation and structural elements

    New Auto-Interp
    Negative Logits
    niſſe
    -0.86
    ſſung
    -0.84
    RegressionTest
    -0.83
     imagui
    -0.82
     ſind
    -0.82
    abestanden
    -0.81
     fashiola
    -0.79
     disambiguazione
    -0.78
    <unused20>
    -0.77
    [@BOS@]
    -0.77
    POSITIVE LOGITS
    the
    0.61
     the
    0.60
     our
    0.58
     selaku
    0.46
    our
    0.36
     who
    0.35
     your
    0.32
     their
    0.32
    min
    0.30
    ti
    0.30
    Act Density 0.088%

    No Known Activations