INDEX
    Explanations

    complex nested data structures or syntactic patterns in code

    New Auto-Interp
    Negative Logits
     —
    -0.67
    ors
    -0.66
    ,
    -0.62
     L
    -0.60
     he
    -0.59
     H
    -0.57
     He
    -0.57
     A
    -0.56
     N
    -0.56
     S
    -0.55
    POSITIVE LOGITS
    OGND
    1.13
     autorytatywna
    1.07
     reaſon
    1.07
     purpoſe
    1.06
     poffible
    1.01
     myſelf
    1.00
     pleaſure
    0.99
     neceffary
    0.98
     neceſſ
    0.97
    +#+#
    0.95
    Act Density 0.024%

    No Known Activations