INDEX
    Explanations

    words related to speaking or discourse

    New Auto-Interp
    Negative Logits
    nap
    -0.08
    ongs
    -0.07
    'gc
    -0.07
    las
    -0.07
    wan
    -0.07
    printStats
    -0.07
    outh
    -0.06
    Ø«ÛĮر
    -0.06
    strap
    -0.06
    Winvalid
    -0.06
    POSITIVE LOGITS
    indle
    0.08
    tember
    0.08
    ertino
    0.07
    ake
    0.07
    iment
    0.07
    й
    0.07
     Spe
    0.07
     spe
    0.06
    ars
    0.06
    heat
    0.06
    Act Density 0.008%

    No Known Activations