INDEX
Explanations
references to television shows
New Auto-Interp
Negative Logits
toi
-0.17
epad
-0.16
inos
-0.16
uries
-0.16
ustralian
-0.16
ches
-0.15
ulas
-0.15
chant
-0.15
bis
-0.15
zos
-0.15
POSITIVE LOGITS
manship
0.23
runner
0.19
rooms
0.17
ings
0.15
swire
0.15
girls
0.15
biz
0.14
tail
0.14
girl
0.14
519
0.14
Activations Density 0.034%