INDEX
Explanations
nationalities and their occupations
New Auto-Interp
Negative Logits
language
1.00
language
0.90
地区
0.87
росій
0.77
شریف
0.76
drama
0.76
Russian
0.75
languages
0.75
scaping
0.75
地区的
0.74
POSITIVE LOGITS
ಟ
0.77
वंश
0.75
eem
0.74
strut
0.73
inroads
0.73
зу
0.73
ट्यूब
0.72
esha
0.70
tenure
0.70
chemists
0.70
Activations Density 0.055%