INDEX
Explanations
phrases indicating subjective evaluations or personal reflections
New Auto-Interp
Negative Logits
558
-0.15
itas
-0.15
741
-0.14
ýt
-0.14
_SR
-0.14
íĻĪ
-0.14
ċ
-0.13
eto
-0.13
855
-0.13
585
-0.13
POSITIVE LOGITS
posts
0.42
blog
0.40
article
0.38
post
0.35
blog
0.34
articles
0.33
posting
0.32
posts
0.32
-blog
0.31
blogs
0.31
Activations Density 0.209%