INDEX
    Explanations

    high activation values across various sections of structured data

    New Auto-Interp
    Negative Logits
     Бли
    -0.66
     control
    -0.64
    aktery
    -0.63
    Tok
    -0.62
                     
    -0.60
     Bue
    -0.60
     hela
    -0.59
    ので
    -0.59
     Towel
    -0.58
    bule
    -0.58
    POSITIVE LOGITS
    9
    2.02
     NINE
    1.43
     Ninth
    1.42
     ninth
    1.32
     Nine
    1.31
    nine
    1.29
    ۹
    1.29
     ninety
    1.28
    ninth
    1.25
     nine
    1.23
    Act Density 0.709%

    No Known Activations