INDEX
    Explanations

    adjectives describing a degree or extent

    instances of the word "rather" in various contexts

    New Auto-Interp
    Negative Logits
    elsen
    -1.00
    ppo
    -0.80
    ruary
    -0.76
    ocaust
    -0.76
    amba
    -0.75
    illary
    -0.73
    mberg
    -0.72
    wana
    -0.71
    ahu
    -0.69
    DD
    -0.69
    POSITIVE LOGITS
     amusing
    0.89
     inconvenient
    0.85
     unimagin
    0.83
     unpleasant
    0.82
     pricey
    0.80
     awkward
    0.80
     incons
    0.77
     complicated
    0.76
     unusual
    0.76
     harmless
    0.76
    Act Density 0.016%

    No Known Activations