INDEX
    Explanations

    adverbs that modify intensity or manner

    New Auto-Interp
    Negative Logits
    ity
    -0.15
    trusted
    -0.14
     admittedly
    -0.14
     Count
    -0.14
    kke
    -0.14
    :checked
    -0.14
    ausal
    -0.13
    оÑī
    -0.13
    eniz
    -0.13
     muted
    -0.13
    POSITIVE LOGITS
    ono
    0.16
    ingly
    0.16
     accurate
    0.16
     different
    0.15
    omb
    0.15
     aware
    0.15
    ео
    0.15
    ovich
    0.14
    obi
    0.14
     mát
    0.14
    Act Density 0.068%

    No Known Activations