INDEX
Explanations
ethnicity, race, or specific titles
book titles and descriptions
New Auto-Interp
Negative Logits
на
0.84
но
0.78
羅
0.77
ார்
0.76
Ꮩ
0.71
Cheng
0.71
즈
0.71
مر
0.70
罗
0.70
िक
0.70
POSITIVE LOGITS
a
0.88
Protestant
0.81
0.80
{0.76
ainult
0.74
A
0.73
στην
0.73
ovvero
0.73
我会
0.73
ngunit
0.73
Activations Density 0.000%