INDEX
Explanations
references to reality television shows and their elements
New Auto-Interp
Negative Logits
atown
-0.16
avern
-0.15
ouro
-0.15
rawer
-0.15
úa
-0.15
ivable
-0.15
mares
-0.15
atism
-0.15
оÑĢаÑı
-0.14
поба
-0.14
POSITIVE LOGITS
daily
0.17
693
0.17
contestants
0.16
asına
0.16
villa
0.15
692
0.15
contestant
0.14
Mon
0.14
ç̬
0.14
hooks
0.14
Activations Density 0.002%