INDEX
Explanations
questions starting with "So what"
questions that begin with "what."
New Auto-Interp
Negative Logits
renheit
-0.75
ãĥĭ
-0.65
anus
-0.65
rim
-0.64
agonists
-0.64
20439
-0.63
gression
-0.62
mens
-0.62
apsed
-0.61
cit
-0.61
POSITIVE LOGITS
exactly
1.15
does
0.96
?
0.94
do
0.92
?????
0.90
happens
0.90
SHOULD
0.89
DOES
0.86
?!
0.85
!?
0.85
Activations Density 0.083%