INDEX
    Explanations

    explicit expressions of strong opinions

    expressions of strong sentiments or reactions

    New Auto-Interp
    Negative Logits
    edIn
    -0.74
    Wik
    -0.65
    _-
    -0.63
    Ct
    -0.63
    indle
    -0.62
    iba
    -0.61
    yrinth
    -0.61
    Interface
    -0.59
     Cla
    -0.59
    anian
    -0.58
    POSITIVE LOGITS
     impression
    0.84
     advice
    0.79
    icum
    0.79
     Rosenstein
    0.70
    onsense
    0.69
     counsel
    0.67
    summary
    0.66
    chance
    0.66
    arsh
    0.65
     amnesty
    0.64
    Act Density 0.213%

    No Known Activations