INDEX
    Explanations

    elephants, zebras, apes, and other animals

    New Auto-Interp
    Negative Logits
    تنا
    0.97
    tım
    0.96
    tion
    0.94
    션을
    0.92
    ture
    0.90
    تهم
    0.89
    tır
    0.87
    tia
    0.84
    0.83
    tj
    0.83
    POSITIVE LOGITS
     elephants
    1.29
     safari
    1.29
    🐘
    1.27
     dolphins
    1.25
     animais
    1.23
     mammals
    1.22
     elef
    1.19
    그러나
    1.19
     giraffe
    1.17
     животных
    1.16
    Act Density 0.176%

    No Known Activations