INDEX
Explanations
phrases with personal pronouns followed by verbs
references to a specific female subject
New Auto-Interp
Negative Logits
Skydragon
-0.73
INGTON
-0.69
atory
-0.68
ornia
-0.67
assing
-0.67
kefeller
-0.65
shaping
-0.61
~~~~
-0.60
ouver
-0.59
ilateral
-0.59
POSITIVE LOGITS
pherd
1.38
pher
1.31
pard
1.20
ffield
1.14
athed
1.13
athing
1.11
ppard
1.10
ldon
1.09
lly
0.96
ikh
0.95
Activations Density 0.088%