INDEX
    Explanations

    comparative adjectives indicating size, speed, and weight

    New Auto-Interp
    Negative Logits
     T
    -0.80
     p
    -0.73
     Z
    -0.72
     G
    -0.72
     mit
    -0.72
     Go
    -0.69
     Ar
    -0.68
     S
    -0.68
     Al
    -0.67
     F
    -0.67
    POSITIVE LOGITS
     leſs
    1.88
     myſelf
    1.65
     happier
    1.54
     healthier
    1.52
     shallo
    1.51
     easier
    1.51
     quieter
    1.50
     thicker
    1.50
     thinner
    1.50
     heavier
    1.50
    Act Density 0.164%

    No Known Activations