INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.88
    ]
    0.84
    ä
    0.84
    ،
    0.84
    0.82
     Breeders
    0.80
     Court
    0.77
    0.76
     Boards
    0.75
     Research
    0.73
    POSITIVE LOGITS
    ת
    1.27
    м
    1.24
    на
    1.22
    ى
    1.21
    و
    1.20
    ка
    1.18
    י
    1.18
    ле
    1.13
    it
    1.13
    ли
    1.09
    Act Density 0.001%

    No Known Activations