INDEX
Explanations
phrases or words expressing difficulty or disbelief
phrases that indicate difficulty or challenges in understanding or explaining concepts
New Auto-Interp
Negative Logits
afety
-0.86
ements
-0.66
plates
-0.65
Dialogue
-0.65
erville
-0.64
leys
-0.64
RIPT
-0.63
mails
-0.62
Needs
-0.61
cand
-0.61
POSITIVE LOGITS
achieve
1.09
accomplish
1.05
distinguish
1.04
obtain
1.04
attain
1.02
imagine
1.02
reconcile
1.01
convince
1.00
conceive
0.99
penetrate
0.97
Activations Density 0.090%