INDEX
    Explanations

    words or phrases related to arguments or debates

    New Auto-Interp
    Negative Logits
    TRS
    -0.15
    chen
    -0.15
    akter
    -0.15
    igans
    -0.15
    zial
    -0.15
     closure
    -0.15
    closure
    -0.15
    chsel
    -0.14
    ral
    -0.14
    eh
    -0.14
    POSITIVE LOGITS
    uably
    0.35
    entin
    0.32
    onaut
    0.31
    entine
    0.29
    uing
    0.28
    entina
    0.28
    yle
    0.27
    uable
    0.27
    ued
    0.27
    inine
    0.26
    Act Density 0.007%

    No Known Activations