INDEX
Explanations
proper nouns, specifically names of people and their associated attributes
New Auto-Interp
Negative Logits
Putih
-0.57
Hitam
-0.55
financeira
-0.51
parken
-0.50
Internasional
-0.50
econômica
-0.49
Mulher
-0.48
hallen
-0.47
Kerk
-0.47
Ström
-0.45
POSITIVE LOGITS
mann
0.66
berger
0.53
ke
0.50
berg
0.48
heck
0.47
hardt
0.47
seaborn
0.47
lma
0.46
hold
0.46
mann
0.45
Activations Density 0.419%