INDEX
    Explanations

    statements indicating a necessity or recommendation to consider certain actions or decisions

    phrases suggesting recommendations or advice

    New Auto-Interp
    Negative Logits
    hiba
    -0.72
    ilian
    -0.65
    lish
    -0.59
    ivism
    -0.59
     Poverty
    -0.59
     Became
    -0.58
    ccording
    -0.57
     Dou
    -0.55
    oub
    -0.55
    ynski
    -0.55
    POSITIVE LOGITS
    ij士
    0.75
    ãĤ¦ãĤ¹
    0.72
    gotten
    0.70
    dos
    0.69
    lessly
    0.69
    ate
    0.68
    reprene
    0.65
    ter
    0.64
    TEXTURE
    0.63
     to
    0.61
    Act Density 0.060%

    No Known Activations