INDEX
    Explanations

    a specific pattern of repeated symbols and formatting indicators in the text

    New Auto-Interp
    Negative Logits
    gridx
    -0.71
     Rond
    -0.69
     Gren
    -0.67
    Fron
    -0.67
     Fron
    -0.67
    EndContext
    -0.66
    zeera
    -0.65
    )");
    
    -0.64
     dign
    -0.63
     consultato
    -0.63
    POSITIVE LOGITS
     *
    1.64
    !*
    1.51
    :*
    1.45
    ?*
    1.42
    >*
    1.41
    -*
    1.41
    ()*
    1.39
     $*$
    1.36
    .*
    1.34
     $*
    1.34
    Act Density 0.873%

    No Known Activations