INDEX
    Explanations

    phrases indicating extremity or intensity of action or opinion

    phrases indicating actions or opinions that go to extremes or limits

    New Auto-Interp
    Negative Logits
    itu
    -0.77
    rio
    -0.72
    odes
    -0.70
     Puppet
    -0.69
    cyclopedia
    -0.68
    tu
    -0.66
    otten
    -0.66
     Pend
    -0.64
    ĭ
    -0.63
    odd
    -0.63
    POSITIVE LOGITS
     lengths
    0.77
     differently
    0.76
     stride
    0.71
     unnoticed
    0.68
     embro
    0.66
     persu
    0.65
     step
    0.64
     overr
    0.62
    HER
    0.61
     rug
    0.60
    Act Density 0.039%

    No Known Activations