INDEX
Explanations
references to previous works or articles
phrases indicating reference to prior content or previous discussions
New Auto-Interp
Negative Logits
$$$$
-0.79
orsche
-0.67
BY
-0.65
orean
-0.65
arov
-0.64
ereo
-0.62
adle
-0.61
scissors
-0.60
aren
-0.60
restores
-0.59
POSITIVE LOGITS
blog
1.12
article
1.08
blogs
1.06
articles
1.03
blogs
0.96
posts
0.92
Blog
0.91
discussing
0.89
blog
0.88
column
0.86
Activations Density 0.268%