INDEX
Explanations
pronouns referring to male and female characters
New Auto-Interp
Negative Logits
SPONSORED
-0.80
induce
-0.74
include
-0.73
mandate
-0.67
パ
-0.64
cause
-0.64
create
-0.64
produce
-0.63
denote
-0.63
catch
-0.63
POSITIVE LOGITS
remembers
1.11
knows
1.03
believes
1.01
has
1.00
hasn
0.97
loves
0.97
deserves
0.96
regrets
0.96
awaits
0.95
feels
0.93
Activations Density 0.279%