INDEX
    Explanations

    adjectives that express size and feelings

    New Auto-Interp
    Negative Logits
     np
    -0.66
    aho
    -0.65
    ugal
    -0.63
    {"
    -0.63
     Alright
    -0.62
    AAA
    -0.62
     Dynamics
    -0.60
     Afgh
    -0.60
    romeda
    -0.59
     Yard
    -0.59
    POSITIVE LOGITS
     practically
    0.83
     even
    0.81
     barely
    0.81
     scarcely
    0.79
     hardly
    0.78
    ruciating
    0.77
     almost
    0.74
     unus
    0.70
     virtually
    0.68
    >]
    0.66
    Act Density 0.268%

    No Known Activations