INDEX
Explanations
proper nouns, specifically surnames
names of individuals, particularly those related to a specific context or event
New Auto-Interp
Negative Logits
pride
-0.67
Sussex
-0.66
Curse
-0.61
swamp
-0.59
Indians
-0.58
dividend
-0.57
slack
-0.56
Naruto
-0.56
graffiti
-0.55
tyre
-0.55
POSITIVE LOGITS
mann
1.79
strom
1.58
feld
1.54
lund
1.49
gren
1.46
berg
1.46
meyer
1.43
quist
1.42
baum
1.41
stein
1.38
Activations Density 0.102%