INDEX
    Explanations

    phrases that reference regular or repeated actions

    New Auto-Interp
    Negative Logits
    linger
    -0.18
    ickey
    -0.15
    ligt
    -0.15
    awns
    -0.14
    ays
    -0.14
    angu
    -0.14
    ux
    -0.14
     ç©
    -0.14
    nable
    -0.14
    aml
    -0.13
    POSITIVE LOGITS
     basis
    0.25
     whim
    0.24
     regular
    0.23
     scale
    0.23
    basis
    0.21
     consistent
    0.20
     sho
    0.20
    scale
    0.19
     dime
    0.18
     Sho
    0.18
    Act Density 0.038%

    No Known Activations