INDEX
Explanations
references to nationalities or ethnicities
New Auto-Interp
Negative Logits
rungsseite
-0.54
stufe
-0.47
ValueStyle
-0.45
typelib
-0.45
compaction
-0.43
Gelände
-0.43
گردد
-0.43
azgo
-0.43
dė
-0.41
liturgy
-0.41
POSITIVE LOGITS
Mexican
0.92
Italian
0.90
Japanese
0.89
Russian
0.88
Spanish
0.86
German
0.85
Indian
0.85
French
0.85
Russian
0.84
Chinese
0.84
Activations Density 0.359%