INDEX
Explanations
references to the origins and identities of individuals
New Auto-Interp
Negative Logits
ahl
-0.14
oulder
-0.14
essor
-0.14
Henderson
-0.14
fertile
-0.14
ifs
-0.13
Spirits
-0.13
нод
-0.13
occupied
-0.12
atl
-0.12
POSITIVE LOGITS
h
0.40
native
0.39
hail
0.38
origin
0.38
originally
0.34
æĿ¥èĩª
0.34
natives
0.33
born
0.33
origins
0.33
origin
0.32
Activations Density 0.198%