INDEX
Explanations
references to specific names (possibly surnames)
the names of individuals, particularly those relevant to specific contexts or events
New Auto-Interp
Negative Logits
meric
-0.75
Doodle
-0.71
ours
-0.69
fruit
-0.65
carts
-0.62
iates
-0.61
gamer
-0.61
Pony
-0.61
grading
-0.61
mediate
-0.61
POSITIVE LOGITS
Andersen
1.19
quist
1.05
ensen
1.00
sson
1.00
Bj
0.99
qv
0.98
Sven
0.96
lund
0.95
ersen
0.95
gaard
0.91
Activations Density 0.034%