INDEX
Explanations
phrases where the author expresses uncertainty or lack of knowledge
phrases expressing uncertainty or lack of knowledge
New Auto-Interp
Negative Logits
GBT
-0.72
gencies
-0.72
redients
-0.69
izont
-0.67
ounters
-0.63
attery
-0.63
arra
-0.63
incial
-0.61
onding
-0.61
ishable
-0.60
POSITIVE LOGITS
RP
0.69
how
0.69
oooooooo
0.69
hin
0.68
anymore
0.66
darn
0.64
éĸ
0.64
exactly
0.64
Coh
0.64
why
0.63
Activations Density 0.064%