INDEX
    Explanations

    verbs related to thinking or opinions

    New Auto-Interp
    Negative Logits
    yna
    -0.86
    Redditor
    -0.73
    oubted
    -0.72
    iona
    -0.72
    ulia
    -0.70
    qqa
    -0.66
    inar
    -0.65
    etermined
    -0.65
    lehem
    -0.64
    acted
    -0.64
    POSITIVE LOGITS
     differently
    1.08
     otherwise
    0.93
     alike
    0.84
     twice
    0.84
     they
    0.84
     about
    0.81
    lessly
    0.76
     that
    0.72
    fulness
    0.69
    fully
    0.68
    Act Density 0.078%

    No Known Activations