INDEX
    Explanations

    words that express necessity or demand

    New Auto-Interp
    Negative Logits
    åħ
    -0.15
    irt
    -0.15
    ensen
    -0.15
     Perez
    -0.14
    ondon
    -0.14
    pending
    -0.14
     Pearce
    -0.14
     ausp
    -0.13
    agem
    -0.13
     justified
    -0.13
    POSITIVE LOGITS
    bjerg
    0.18
    umba
    0.15
    bservice
    0.15
     aalborg
    0.15
     fkk
    0.15
    _Framework
    0.14
    avir
    0.14
     Bald
    0.14
    immune
    0.14
    vig
    0.14
    Act Density 0.002%

    No Known Activations