INDEX
    Explanations

    terms related to political and social contexts

    New Auto-Interp
    Negative Logits
    est
    -0.17
    ORB
    -0.16
    ive
    -0.16
    erez
    -0.16
    ables
    -0.15
    IRD
    -0.15
    /fast
    -0.15
    able
    -0.15
    ird
    -0.14
    ãĤ¨ãĥ«
    -0.14
    POSITIVE LOGITS
     speaking
    0.31
     sound
    0.27
    sound
    0.27
     Speaking
    0.26
    -speaking
    0.25
     SOUND
    0.24
    Sound
    0.23
    Speaking
    0.23
     minded
    0.23
     Sound
    0.23
    Act Density 0.059%

    No Known Activations