INDEX
Explanations
phrases related to commentary or responses in written discussions
New Auto-Interp
Negative Logits
pron
-0.15
wyn
-0.15
utes
-0.14
ramer
-0.14
ernen
-0.13
iani
-0.13
WithURL
-0.13
λή
-0.13
tridges
-0.13
itou
-0.12
POSITIVE LOGITS
Anonymous
0.38
anonymous
0.37
anonymous
0.32
Anonymous
0.30
anonymously
0.28
anon
0.26
someone
0.26
anonym
0.24
someone
0.23
onymous
0.21
Activations Density 0.142%