INDEX
Explanations
entities or brand names mentioned in texts
references to streaming services and digital content
New Auto-Interp
Negative Logits
Rated
-0.95
instead
-0.79
Reply
-0.71
Atl
-0.71
çİĭ
-0.70
ITH
-0.67
KEN
-0.66
}.
-0.64
______
-0.64
OTH
-0.64
POSITIVE LOGITS
caveats
0.72
disclaimer
0.71
arrests
0.66
announcements
0.65
nods
0.64
occasional
0.64
exceptions
0.64
additions
0.64
shuffle
0.63
perks
0.62
Activations Density 0.316%