INDEX
Explanations
the mention of where someone is originally from
references to 'native' speakers or individuals
New Auto-Interp
Negative Logits
ATA
-0.82
attr
-0.81
apego
-0.78
ammy
-0.75
ATHER
-0.74
=-=-=-=-
-0.74
ENA
-0.73
earchers
-0.71
TOP
-0.70
eper
-0.69
POSITIVE LOGITS
born
0.86
native
0.85
americ
0.77
Advertisement
0.74
spe
0.73
Instruments
0.73
native
0.71
oise
0.70
inhabit
0.69
izations
0.69
Activations Density 0.009%