INDEX
Explanations
mentions of questioning or expressing opinions
questions and dialogue
New Auto-Interp
Negative Logits
seless
-0.72
©¶æ
-0.69
isol
-0.68
runs
-0.67
imar
-0.65
1934
-0.65
steen
-0.65
CLE
-0.63
leg
-0.62
buggy
-0.62
POSITIVE LOGITS
.?
0.83
topic
0.80
moderator
0.77
sugg
0.77
Explain
0.77
?:
0.75
/?
0.72
captcha
0.71
kb
0.68
Void
0.67
Activations Density 0.493%