INDEX
Explanations
the word "binge" or variations of it
references to binge-watching and related behaviors
New Auto-Interp
Negative Logits
ando
-0.79
nil
-0.74
OST
-0.72
ioned
-0.71
Nare
-0.68
UTION
-0.67
anim
-0.67
amoto
-0.66
ansom
-0.66
iage
-0.65
POSITIVE LOGITS
binge
0.97
bul
0.90
inge
0.86
âĸ¬
0.75
eaves
0.74
harb
0.71
clen
0.70
episodes
0.69
hoard
0.69
empt
0.68
Activations Density 0.021%