INDEX
    Explanations

    references to specific groups or entities enclosed in square brackets

    references to groups or entities enclosed in brackets

    New Auto-Interp
    Negative Logits
     redu
    -0.80
    edIn
    -0.71
     Elys
    -0.68
     pens
    -0.66
    ramid
    -0.65
     therap
    -0.65
     Seym
    -0.63
    otted
    -0.63
     handlers
    -0.62
    eday
    -0.61
    POSITIVE LOGITS
    sic
    1.55
    ?]
    1.38
    !]
    1.25
    :]
    1.16
    emphasis
    1.15
    â̦]
    1.12
    REDACTED
    1.11
    ](
    1.06
    %]
    1.06
    laughs
    1.05
    Act Density 0.038%

    No Known Activations