INDEX
Explanations
references to relationships and familial connections
New Auto-Interp
Negative Logits
Elijah
-0.47
Ronald
-0.47
Kenneth
-0.43
muž
-0.42
Derek
-0.42
Trevor
-0.41
Anthony
-0.41
Charles
-0.40
Richard
-0.40
Robert
-0.40
POSITIVE LOGITS
Ann
0.84
Anne
0.78
Maria
0.78
Sara
0.78
Mary
0.77
Anna
0.77
Laura
0.75
Sarah
0.75
Marie
0.73
Ann
0.73
Activations Density 0.906%