INDEX
Explanations
references to the Arabic language or its cultural significance
New Auto-Interp
Negative Logits
ertodd
-1.05
hov
-0.88
lessly
-0.87
lasses
-0.80
pload
-0.76
ups
-0.73
vg
-0.72
Panasonic
-0.71
kj
-0.69
olicy
-0.69
POSITIVE LOGITS
language
0.84
numer
0.83
ophone
0.80
ophobia
0.80
Language
0.79
س
0.79
Corpus
0.77
proverb
0.77
iyah
0.76
ica
0.76
Activations Density 0.005%