INDEX
Explanations
Italian abstract nouns and Sapienza
New Auto-Interp
Negative Logits
uxe
0.41
یره
0.39
employed
0.39
اع
0.38
历
0.38
Our
0.37
സ്
0.36
াসী
0.36
ASF
0.36
Our
0.36
POSITIVE LOGITS
ità
0.73
tà
0.61
degli
0.57
Sap
0.57
Università
0.57
à
0.55
sap
0.54
Sap
0.54
À
0.48
ilità
0.48
Activations Density 0.006%