INDEX
Explanations
proper names
repeated names or references to individuals
New Auto-Interp
Negative Logits
offline
-0.74
Korra
-0.73
Manip
-0.68
overseas
-0.68
wow
-0.67
Important
-0.66
Yoga
-0.64
Naruto
-0.63
Rebirth
-0.63
numbered
-0.63
POSITIVE LOGITS
isner
1.55
idel
1.35
iber
1.32
ffer
1.27
iman
1.26
hner
1.26
aney
1.24
isel
1.23
cker
1.20
hn
1.19
Activations Density 0.081%