INDEX
Explanations
mentions of people's places of origin
words related to a person's place of origin or nationality
New Auto-Interp
Negative Logits
ushima
-0.72
haar
-0.69
creation
-0.68
usters
-0.67
clusive
-0.66
arent
-0.65
erial
-0.65
idden
-0.65
oho
-0.64
itness
-0.64
POSITIVE LOGITS
Ezra
0.75
Shaun
0.75
Reggie
0.75
Liam
0.73
Joined
0.73
Zac
0.72
Ally
0.71
Colin
0.71
Tav
0.68
Joey
0.68
Activations Density 0.131%