INDEX
Explanations
elephants, zebras, apes, and other animals
New Auto-Interp
Negative Logits
تنا
0.97
tım
0.96
tion
0.94
션을
0.92
ture
0.90
تهم
0.89
tır
0.87
tia
0.84
tà
0.83
tj
0.83
POSITIVE LOGITS
elephants
1.29
safari
1.29
🐘
1.27
dolphins
1.25
animais
1.23
mammals
1.22
elef
1.19
그러나
1.19
giraffe
1.17
животных
1.16
Activations Density 0.176%