INDEX
Explanations
mentions of boys
references to boys and their relationships with girls
New Auto-Interp
Negative Logits
sole
-0.83
ointment
-0.77
lich
-0.75
mediated
-0.68
leased
-0.66
uncture
-0.66
OLOG
-0.66
rior
-0.65
patient
-0.65
Hulk
-0.64
POSITIVE LOGITS
hift
1.06
ages
0.98
pace
0.97
mith
0.95
terday
0.91
'
0.88
hips
0.86
ieve
0.83
heet
0.82
ocial
0.81
Activations Density 0.094%