INDEX
    Explanations

    linguistic and structural elements in written text

    New Auto-Interp
    Negative Logits
    lus
    -0.18
    lrt
    -0.17
    enso
    -0.16
    laz
    -0.16
    ventus
    -0.15
    lain
    -0.15
    Inspectable
    -0.15
    abox
    -0.15
    ế
    -0.14
    agini
    -0.14
    POSITIVE LOGITS
     au
    0.40
     aux
    0.40
     al
    0.32
     Aux
    0.29
     alla
    0.28
     ao
    0.28
     aos
    0.28
     Au
    0.28
    aux
    0.27
    au
    0.25
    Act Density 0.045%

    No Known Activations