INDEX
Explanations
phrases emphasizing certainty or strong affirmation
the word "certainly" in various contexts
New Auto-Interp
Negative Logits
glers
-0.81
agus
-0.76
OSH
-0.74
uese
-0.74
lay
-0.73
gencies
-0.72
Offline
-0.70
idas
-0.69
gency
-0.68
ulative
-0.66
POSITIVE LOGITS
deserved
0.78
qualifies
0.77
behaved
0.74
exagger
0.71
benefited
0.70
deline
0.69
ought
0.67
appreciated
0.67
torped
0.66
appreci
0.65
Activations Density 0.025%