INDEX
    Explanations

    the beginning of new sections or thoughts in a text

    before citations or references

    New Auto-Interp
    Negative Logits
    -------
    -0.72
     betweenstory
    -0.63
    calaure
    -0.58
    UrlResolution
    -0.57
    findOrFail
    -0.57
    owatt
    -0.56
    RegressionTest
    -0.56
    -0.54
    bhan
    -0.54
    odori
    -0.54
    POSITIVE LOGITS
     disambiguazione
    0.84
     nahilalakip
    0.83
     autorytatywna
    0.80
    :✨
    0.79
    Autoritní
    0.79
    principalColumn
    0.78
     Roskov
    0.77
    awtextra
    0.75
     chi̍t
    0.74
     مُعرِّف
    0.74
    Act Density 0.152%

    No Known Activations