INDEX
    Explanations

    words related to identity and affiliation

    New Auto-Interp
    Negative Logits
    aurus
    -0.16
    berra
    -0.16
    adays
    -0.16
    opoulos
    -0.15
    οι
    -0.15
    -vous
    -0.15
    odore
    -0.14
     anale
    -0.14
    away
    -0.14
    AILABLE
    -0.14
    POSITIVE LOGITS
    umes
    0.17
    ures
    0.17
    DN
    0.15
    ité
    0.14
    isc
    0.14
    _tooltip
    0.14
    Åį
    0.14
    BCM
    0.14
    tbl
    0.14
    adi
    0.13
    Act Density 0.493%

    No Known Activations