INDEX
Explanations
names of people or entities, particularly with a focus on political figures and institutions
names of individuals or groups associated with political or social contexts
New Auto-Interp
Negative Logits
er
-1.03
eros
-0.97
uras
-0.92
eric
-0.87
shire
-0.87
urus
-0.86
oise
-0.84
eur
-0.82
ersen
-0.81
uran
-0.80
POSITIVE LOGITS
ãģĨ
0.61
à¨
0.57
å½
0.56
大
0.55
ãģ£
0.54
Ùħ
0.52
åį
0.52
èĥ
0.51
ÙĴ
0.51
ä¼
0.51
Activations Density 0.237%