INDEX
Explanations
references to familial relationships
New Auto-Interp
Negative Logits
Jeremy
-0.68
kevin
-0.68
Jared
-0.67
Jared
-0.67
himself
-0.65
Sean
-0.64
Jeremy
-0.63
Kenneth
-0.63
Jason
-0.63
Kevin
-0.62
POSITIVE LOGITS
actress
0.99
actresses
0.94
herself
0.91
lady
0.84
Elizabeth
0.84
princess
0.82
housewife
0.82
goddess
0.81
matron
0.80
Elizabeth
0.80
Activations Density 0.663%