INDEX
Explanations
phrases related to body parts and actions
possessive pronouns and references to ownership
New Auto-Interp
Negative Logits
hereafter
-0.73
Rowling
-0.73
Pwr
-0.67
GAN
-0.66
'-
-0.65
Eucl
-0.64
Hear
-0.64
Izan
-0.62
ablishment
-0.62
Aren
-0.61
POSITIVE LOGITS
own
1.38
fingers
1.05
knees
1.01
selves
1.00
nose
0.98
fists
0.96
self
0.96
toes
0.94
hips
0.93
lips
0.91
Activations Density 0.146%