INDEX
    Explanations

    phrases related to likelihood or potentiality

    phrases expressing perceptions or opinions

    New Auto-Interp
    Negative Logits
    ilts
    -0.76
    zb
    -0.76
    estern
    -0.75
    ffen
    -0.71
    ainers
    -0.70
    cised
    -0.66
    ogi
    -0.65
    opers
    -0.65
    ests
    -0.64
    ourse
    -0.62
    POSITIVE LOGITS
     innocuous
    1.03
     awfully
    1.03
     oddly
    0.98
     destined
    0.93
     strangely
    0.93
     unlikely
    0.92
     tailor
    0.91
     unstoppable
    0.91
     to
    0.90
     harmless
    0.90
    Act Density 0.061%

    No Known Activations