INDEX
Explanations
specific instances of the word "this" followed by another word or phrase
references to reading specific articles, posts, or communications
New Auto-Interp
Negative Logits
roxy
-0.79
hesda
-0.72
aires
-0.71
ichick
-0.71
asia
-0.70
akable
-0.68
omaly
-0.67
ongevity
-0.66
arent
-0.66
otics
-0.66
POSITIVE LOGITS
aloud
1.44
excerpts
0.93
papers
0.87
instructions
0.84
transcript
0.84
passages
0.84
é¾įå¥ij士
0.82
DragonMagazine
0.82
reviews
0.81
blogs
0.81
Activations Density 0.140%