INDEX
    Explanations

    type safe and /p/, /b/ sounds

    New Auto-Interp
    Negative Logits
    0.78
    ли
    0.75
    lymph
    0.73
    tri
    0.72
    方針
    0.71
    сний
    0.71
    table
    0.71
    swarm
    0.71
    ्य
    0.70
    sided
    0.70
    POSITIVE LOGITS
    0.79
    A
    0.79
    b
    0.76
    N
    0.76
    O
    0.76
    და
    0.70
     
    0.70
    AB
    0.70
     as
    0.70
    𝘼
    0.69
    Act Density 0.000%

    No Known Activations