INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ang
    1.21
    in
    1.14
    am
    1.04
    en
    1.03
     a
    1.00
    ام
    1.00
    та
    0.94
    0.89
    ak
    0.89
    اك
    0.89
    POSITIVE LOGITS
    กับ
    1.15
     monkey
    1.05
    Monkey
    1.05
     Monkey
    0.98
    ב
    0.97
     Monkeys
    0.95
    }
    0.94
    )
    0.87
    0.87
    ]
    0.86
    Act Density 0.003%

    No Known Activations