INDEX
Explanations
specific questions or statements expressing uncertainty and seeking knowledge or information
questions about knowledge and certainty
New Auto-Interp
Negative Logits
athi
-0.76
bilt
-0.75
ashtra
-0.70
assi
-0.68
ahime
-0.68
Jackets
-0.67
udeau
-0.66
phrine
-0.64
urdue
-0.64
JO
-0.64
POSITIVE LOGITS
beforehand
0.84
whats
0.79
whether
0.78
guesses
0.75
worthiness
0.75
how
0.75
estamp
0.73
exactly
0.70
checked
0.69
detect
0.68
Activations Density 0.212%