INDEX
Explanations
names of people and their associations with various roles
New Auto-Interp
Negative Logits
retire
-0.20
widow
-0.19
decades
-0.19
Widow
-0.17
wives
-0.17
retired
-0.17
wid
-0.17
elderly
-0.16
veteran
-0.16
retirees
-0.16
POSITIVE LOGITS
young
0.21
young
0.20
boyfriend
0.18
underage
0.17
girl
0.17
boy
0.17
@student
0.17
preco
0.17
jeune
0.16
Young
0.16
Activations Density 0.682%