INDEX
Explanations
words related to certainty and emphasis
expressions that indicate clarification or emphasis, particularly using phrases like "of course" and "in fact."
New Auto-Interp
Negative Logits
prus
-0.67
pione
-0.64
\'
-0.62
oun
-0.60
inki
-0.60
ategory
-0.60
krit
-0.59
ailability
-0.59
kefeller
-0.59
Doodle
-0.58
POSITIVE LOGITS
,
1.00
,,
0.88
,.
0.81
oret
0.77
*,
0.77
.,
0.69
)
0.67
anyway
0.65
,-
0.64
anyways
0.62
Activations Density 0.107%