INDEX
Explanations
strings related to user profiles and forum posts
instances of punctuation or emotive expressions
New Auto-Interp
Negative Logits
ĸļ
-0.81
nomine
-0.74
calendars
-0.73
Hust
-0.71
yearly
-0.71
awaru
-0.68
Twain
-0.66
dotted
-0.65
escaping
-0.65
enriched
-0.64
POSITIVE LOGITS
Reply
1.04
lol
0.98
Posted
0.93
https
0.90
Spoiler
0.88
http
0.85
EDIT
0.83
Posts
0.83
________________
0.82
Thread
0.82
Activations Density 0.156%