INDEX
    Explanations

    verbs indicating intensity or strength

    pronouns and their associated structures, indicating actions and descriptions

    New Auto-Interp
    Negative Logits
    ammy
    -0.69
     Helpful
    -0.67
     Angola
    -0.66
    eworthy
    -0.66
     Distance
    -0.64
     Catalyst
    -0.63
    ugal
    -0.63
     Alright
    -0.61
    â̦â̦â̦â̦â̦â̦â̦â̦
    -0.60
     Dynamics
    -0.59
    POSITIVE LOGITS
     practically
    0.83
    >]
    0.80
     barely
    0.77
     warrant
    0.76
     unrecogn
    0.76
     scarcely
    0.75
     deserve
    0.75
     hardly
    0.74
     unus
    0.74
     almost
    0.73
    Act Density 0.094%

    No Known Activations