INDEX
    Explanations

    phrases related to negative or critical opinions about something

    negative expressions or sentiments

    New Auto-Interp
    Negative Logits
     Shank
    -0.66
     strengthened
    -0.65
     Reloaded
    -0.64
     Tik
    -0.63
     hardness
    -0.59
     Irwin
    -0.59
    DERR
    -0.59
    untled
    -0.59
     Nicarag
    -0.59
     hardened
    -0.58
    POSITIVE LOGITS
    recomm
    0.96
    mom
    0.93
    distance
    0.93
    years
    0.92
    favorite
    0.91
    diff
    0.91
    dri
    0.90
    prison
    0.90
    dist
    0.89
    comments
    0.88
    Act Density 0.109%

    No Known Activations