INDEX
    Explanations

    instances of proper nouns and their related titles or affiliations

    New Auto-Interp
    Negative Logits
     Florian
    -0.19
    ɵ
    -0.16
    coli
    -0.16
    ñas
    -0.16
    èĤĸ
    -0.15
    461
    -0.14
     Sanayi
    -0.14
    riority
    -0.14
    èĢ
    -0.14
    ilee
    -0.14
    POSITIVE LOGITS
    ival
    0.22
    udson
    0.20
     Vander
    0.19
     Guil
    0.19
    elson
    0.19
    adir
    0.18
    ilton
    0.18
    ildo
    0.18
     Lu
    0.18
     Wellington
    0.18
    Act Density 0.015%

    No Known Activations