INDEX
    Explanations

    adverbs ending in 'ly'

    New Auto-Interp
    Negative Logits
    ilater
    -0.83
    senal
    -0.75
     hemor
    -0.71
     respectively
    -0.70
    afety
    -0.70
     comprom
    -0.65
    ERA
    -0.65
    itivity
    -0.65
     corrective
    -0.64
     Annotations
    -0.64
    POSITIVE LOGITS
    rics
    0.98
    ffe
    0.88
    zed
    0.88
    puff
    0.88
    sis
    0.82
    upe
    0.81
    tics
    0.81
    pha
    0.79
    waters
    0.77
    clad
    0.76
    Act Density 0.029%

    No Known Activations