INDEX
Explanations
proper nouns of names of people and places
New Auto-Interp
Negative Logits
himself
-0.80
erection
-0.75
GoldMagikarp
-0.67
ATIONS
-0.64
looph
-0.64
tyre
-0.63
hindsight
-0.63
rall
-0.63
uesday
-0.62
Sax
-0.62
POSITIVE LOGITS
herself
1.22
ova
1.11
kaya
0.91
bikini
0.88
eva
0.86
miscar
0.84
husband
0.84
aura
0.84
maid
0.84
veil
0.83
Activations Density 0.423%