INDEX
Explanations
terms related to religious or cultural observances and their significance
New Auto-Interp
Negative Logits
.tb
-0.16
avad
-0.15
.nb
-0.15
vind
-0.15
adj
-0.15
ya
-0.14
NB
-0.14
alink
-0.14
anz
-0.14
iterable
-0.14
POSITIVE LOGITS
Polish
0.20
Poland
0.19
acja
0.17
ÅĦ
0.16
ewn
0.16
iew
0.16
polish
0.16
ÄĻ
0.15
Pols
0.15
ÄĻd
0.15
Activations Density 0.340%