INDEX
Explanations
references to reality television, specifically related to the "Real Housewives" franchise
New Auto-Interp
Negative Logits
subs
-0.17
rale
-0.16
åĽ£
-0.16
Downs
-0.15
esy
-0.14
hed
-0.14
iban
-0.14
onder
-0.13
UTERS
-0.13
lu
-0.13
POSITIVE LOGITS
731
0.16
dzi
0.15
-REAL
0.15
INF
0.14
¬
0.14
ynı
0.14
725
0.14
ìľ¤
0.14
Ù쨹
0.14
_WATCH
0.14
Activations Density 0.012%