INDEX
    Explanations

    references to personal experiences and emotional appeals

    New Auto-Interp
    Negative Logits
    yre
    -0.17
    elin
    -0.17
    921
    -0.16
    'gc
    -0.16
    ritz
    -0.14
    rada
    -0.14
    Instantiate
    -0.14
    Rx
    -0.14
    RP
    -0.14
    427
    -0.13
    POSITIVE LOGITS
    ewith
    0.16
    istrov
    0.15
    Äĥn
    0.15
     Lump
    0.15
    ben
    0.15
     Reply
    0.15
    ponsor
    0.14
     Burk
    0.14
     Esc
    0.14
    apers
    0.14
    Act Density 0.003%

    No Known Activations