INDEX
    Explanations

    mentions of trust and abuse in relational contexts

    New Auto-Interp
    Negative Logits
    idon
    -0.17
    legg
    -0.15
    ácil
    -0.14
    aden
    -0.14
     Died
    -0.14
    ategories
    -0.14
    heel
    -0.14
    adian
    -0.13
    scribe
    -0.13
    odo
    -0.13
    POSITIVE LOGITS
    nave
    0.15
    -none
    0.14
    rey
    0.14
    upal
    0.13
    éľĩ
    0.13
     Plaza
    0.13
    endir
    0.13
    imeType
    0.13
     covid
    0.13
    gam
    0.13
    Act Density 0.018%

    No Known Activations