INDEX
    Explanations

    phrases starting with a dash followed by a number

    negative sentiments or critiques

    New Auto-Interp
    Negative Logits
    irlf
    -0.79
    ecause
    -0.77
    ometimes
    -0.74
    ividual
    -0.74
     withd
    -0.73
    lished
    -0.71
    ancial
    -0.66
    ashtra
    -0.66
    fman
    -0.64
    ij士
    -0.62
    POSITIVE LOGITS
    -
    2.02
    âĢij
    1.40
    âĢIJ
    1.35
    -,
    1.14
    -[
    1.12
    -'
    1.12
    -$
    1.09
    "-
    1.02
    '-
    0.91
    -.
    0.89
    Act Density 0.390%

    No Known Activations