INDEX
    Explanations

    comparative adjectives

    instances of the word "relatively."

    New Auto-Interp
    Negative Logits
    inis
    -0.77
    will
    -0.75
    ses
    -0.72
     Landing
    -0.71
    abad
    -0.69
    core
    -0.69
    PT
    -0.68
    tein
    -0.68
     Polo
    -0.67
    arta
    -0.67
    POSITIVE LOGITS
     unaffected
    0.91
     innocuous
    0.88
     unchanged
    0.88
     scarce
    0.88
     insignificant
    0.87
     insensitive
    0.85
     inexpensive
    0.83
     harmless
    0.83
     unpop
    0.81
     tame
    0.81
    Act Density 0.009%

    No Known Activations