INDEX
    Explanations

    words related to challenges, controversy, and risk

    punctuation marks and their usage in the text

    New Auto-Interp
    Negative Logits
    ãĤ´ãĥ³
    -0.90
    izen
    -0.67
    args
    -0.64
    ãĥį
    -0.63
    ãĥ¯
    -0.62
    ¬¼
    -0.61
    iren
    -0.61
    atars
    -0.60
    isi
    -0.60
    acci
    -0.60
    POSITIVE LOGITS
     yeah
    1.23
     whereas
    1.07
     uh
    0.98
     blah
    0.97
     frankly
    0.96
     [
    0.94
     obviously
    0.94
     basically
    0.94
     because
    0.92
     but
    0.90
    Act Density 0.333%

    No Known Activations