INDEX
Explanations
occurrences of the pronoun "He" and its variants
New Auto-Interp
Negative Logits
avity
-0.15
IFn
-0.15
QtCore
-0.15
IPH
-0.15
AAD
-0.15
acs
-0.15
139
-0.15
adic
-0.14
IG
-0.14
μη
-0.14
POSITIVE LOGITS
imat
0.29
itere
0.22
eres
0.22
bung
0.22
ft
0.21
ilig
0.21
ut
0.21
bam
0.21
il
0.21
ilk
0.20
Activations Density 0.005%