INDEX
Explanations
intelligence and cleverness
New Auto-Interp
Negative Logits
ন
1.45
et
1.25
j
1.25
khảo
1.17
weds
1.16
ॅमिली
1.16
াইভ
1.15
ান
1.13
aliments
1.13
humides
1.13
POSITIVE LOGITS
な
1.27
ية
1.20
ता
1.18
はなく
1.14
соблю
1.13
Wise
1.09
lógica
1.09
이고
1.08
consape
1.08
이라는
1.06
Activations Density 0.476%