INDEX
Explanations
pronouns referring to a specific male individual
references to a specific subject, indicating a focus on a singular male character throughout the text
New Auto-Interp
Negative Logits
Peak
-0.69
Killing
-0.67
reshold
-0.65
earch
-0.63
keleton
-0.60
peak
-0.59
Interest
-0.59
Girls
-0.59
htaking
-0.59
Temperature
-0.58
POSITIVE LOGITS
'd
1.28
'll
1.20
wrote
1.01
zbollah
0.95
eded
0.93
tweeted
0.89
resy
0.87
ported
0.86
've
0.85
pherd
0.84
Activations Density 0.254%