INDEX
Explanations
questions or statements expressing uncertainty or the need to gather more information
statements of uncertainty or knowledge limitations
New Auto-Interp
Negative Logits
phrine
-0.58
rin
-0.58
plex
-0.58
alia
-0.57
jet
-0.57
TN
-0.56
ctory
-0.56
leted
-0.56
ciating
-0.56
Parad
-0.55
POSITIVE LOGITS
definitively
1.21
anymore
1.15
whether
1.06
exactly
1.05
accurately
1.04
precisely
0.98
nor
0.98
exact
0.95
precise
0.95
anything
0.94
Activations Density 0.199%