INDEX
Explanations
phrases indicating good or bad news about various topics
New Auto-Interp
Negative Logits
opensource
-0.17
ootball
-0.16
ETCH
-0.14
ingerprint
-0.14
odable
-0.13
å½¹
-0.13
ΩΣ
-0.13
privilege
-0.13
FirstChild
-0.13
ritable
-0.13
POSITIVE LOGITS
news
0.38
-news
0.31
news
0.29
News
0.27
NEWS
0.25
overall
0.24
News
0.23
signs
0.23
(news
0.21
NEWS
0.21
Activations Density 0.050%