INDEX
    Explanations

    adjectives related to intensity or enhancement

    New Auto-Interp
    Negative Logits
     Goo
    -0.63
    orers
    -0.62
    CLA
    -0.58
     zip
    -0.56
    opers
    -0.55
     craw
    -0.55
    ANGE
    -0.55
    emp
    -0.55
    lore
    -0.55
     CoC
    -0.53
    POSITIVE LOGITS
     by
    1.23
    by
    0.91
     BY
    0.88
    By
    0.87
     By
    0.83
     anew
    0.81
    igated
    0.77
     exponentially
    0.74
     aback
    0.73
    .</
    0.72
    Act Density 0.148%

    No Known Activations