INDEX
Explanations
references to television shows and their ratings
New Auto-Interp
Negative Logits
resco
-0.18
openh
-0.17
sov
-0.16
VED
-0.15
esus
-0.15
vang
-0.14
unma
-0.14
tiler
-0.14
Wid
-0.14
Liber
-0.14
POSITIVE LOGITS
Nickel
0.22
Adoles
0.19
tween
0.19
Disney
0.18
Teen
0.17
Jonas
0.17
teen
0.17
Teen
0.16
Tween
0.16
wholesome
0.16
Activations Density 0.181%