INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    {
    0.91
     ר
    0.81
    .
    0.80
     Николай
    0.73
    she
    0.73
     stato
    0.72
    ние
    0.70
     ക്
    0.69
     gaya
    0.68
     thanh
    0.68
    POSITIVE LOGITS
    س
    1.25
     branches
    1.19
    ر
    1.11
    branches
    1.05
     Branches
    1.04
     Branch
    0.97
    ोन
    0.90
    oos
    0.87
    وک
    0.85
    ap
    0.83
    Act Density 0.008%

    No Known Activations