INDEX
    Explanations

    places or locations

    specific punctuation marks, particularly periods and commas

    New Auto-Interp
    Negative Logits
     dece
    -0.74
     tremend
    -0.68
    è¦ļéĨĴ
    -0.68
     everywhere
    -0.66
    isse
    -0.66
     coy
    -0.65
     mur
    -0.64
     defe
    -0.63
     revol
    -0.63
     monopol
    -0.63
    POSITIVE LOGITS
     Additionally
    1.11
    <|endoftext|>
    1.06
     Afterwards
    1.05
     However
    1.00
     Alternatively
    0.98
     Previously
    0.94
     Furthermore
    0.94
     Moreover
    0.93
     According
    0.92
     Along
    0.92
    Act Density 0.564%

    No Known Activations