INDEX
Explanations
quotes attributed to a male speaker
pronouns and their associated actions or references
New Auto-Interp
Negative Logits
pired
-0.86
Had
-0.85
tained
-0.80
were
-0.76
Been
-0.73
Were
-0.70
ayed
-0.69
Offline
-0.67
guiActiveUnfocused
-0.66
mattered
-0.65
POSITIVE LOGITS
asks
1.53
complains
1.50
concludes
1.48
agrees
1.47
discovers
1.45
begins
1.44
warns
1.44
decides
1.44
realizes
1.43
introduces
1.43
Activations Density 0.455%