INDEX
    Explanations

    expressions of disagreement or disbelief

    New Auto-Interp
    Negative Logits
     Broadcasting
    -0.69
     CLR
    -0.68
     Trinidad
    -0.68
     Rosenthal
    -0.68
     scattering
    -0.67
     adolesc
    -0.66
    ickets
    -0.64
     Transparency
    -0.64
     friction
    -0.64
     Avalon
    -0.64
    POSITIVE LOGITS
    own
    1.12
    ï¸ı
    1.04
    should
    1.01
    agree
    0.99
    swer
    0.99
    must
    0.98
    tal
    0.96
    ve
    0.93
    audi
    0.91
     deserve
    0.88
    Act Density 0.142%

    No Known Activations