INDEX
    Explanations

    adverbs and their forms, particularly those that convey manner or degree

    New Auto-Interp
    Negative Logits
       
    -0.17
    pu
    -0.16
    sg
    -0.15
    sto
    -0.15
    cu
    -0.15
    iful
    -0.15
    uai
    -0.14
    quil
    -0.14
    824
    -0.14
    tron
    -0.14
    POSITIVE LOGITS
    referrer
    0.16
    esch
    0.15
    tics
    0.15
    uger
    0.15
     Speaking
    0.14
    ãĤ·ãĤ¢
    0.14
    éric
    0.14
    spe
    0.14
     اÙĦاع
    0.14
    ktion
    0.14
    Act Density 0.456%

    No Known Activations