INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ри
1.31
ur
1.27
.
1.27
d
1.19
u
1.15
b
1.11
ell
1.09
ine
1.05
ouring
1.02
ı
1.02
POSITIVE LOGITS
ن
1.44
نك
1.09
ات
1.09
ت
1.05
يق
1.01
konz
0.98
presente
0.96
independente
0.96
dominante
0.95
sofistic
0.95
Activations Density 0.000%