INDEX
    Explanations

    phrases indicating perception or interpretation

    phrases that indicate perception or opinion

    New Auto-Interp
    Negative Logits
    oller
    -0.68
    atted
    -0.67
    rolet
    -0.67
    rol
    -0.66
    rower
    -0.66
    atl
    -0.66
    LINE
    -0.64
    zens
    -0.64
    ãĥ¥
    -0.63
    wordpress
    -0.63
    POSITIVE LOGITS
    pires
    0.96
     opposed
    0.94
    pired
    0.86
     follows
    0.85
     belonging
    0.82
    criptions
    0.78
     synonymous
    0.77
    pers
    0.76
     well
    0.75
     expend
    0.75
    Act Density 0.095%

    No Known Activations