INDEX
Explanations
names of people and family relationships
New Auto-Interp
Negative Logits
Kara
-0.18
auge
-0.15
dela
-0.15
Tod
-0.15
tingham
-0.15
allah
-0.15
andal
-0.14
istrovstvÃŃ
-0.14
kara
-0.14
xford
-0.14
POSITIVE LOGITS
Deep
0.30
Deep
0.27
Pri
0.25
Swap
0.25
Pr
0.23
.Deep
0.23
_deep
0.22
deep
0.22
Sand
0.21
Shr
0.20
Activations Density 0.554%