INDEX
Explanations
mentions of specific individuals
references to male and female characters in various contexts
New Auto-Interp
Negative Logits
0200
-0.78
CLR
-0.77
iHUD
-0.65
elman
-0.61
wards
-0.60
DOI
-0.60
varying
-0.60
Drivers
-0.59
optional
-0.58
noon
-0.58
POSITIVE LOGITS
deserves
0.81
cared
0.76
belonged
0.72
pus
0.70
belongs
0.70
knew
0.70
ain
0.69
existed
0.69
cares
0.68
thri
0.66
Activations Density 0.139%