INDEX
Explanations
negative charge or consequences
New Auto-Interp
Negative Logits
ków
2.33
purse
2.19
lardan
2.05
लिये
2.02
væ
1.96
v
1.95
sa
1.92
ни
1.91
jumlah
1.90
τες
1.87
POSITIVE LOGITS
ت
2.67
학과
1.90
خ
1.77
बढ़ते
1.76
ing
1.75
تف
1.75
没有
1.73
تس
1.72
تمر
1.67
afar
1.66
Activations Density 0.784%