INDEX
Explanations
questions
questions in the text
New Auto-Interp
Negative Logits
tremend
-0.69
princ
-0.67
scrut
-0.66
bidden
-0.66
civilisation
-0.65
strugg
-0.65
unbeliev
-0.65
reckoning
-0.65
notor
-0.64
uckland
-0.64
POSITIVE LOGITS
Were
1.35
Explain
1.29
Did
1.22
Was
1.22
What
1.22
Does
1.19
How
1.19
Would
1.17
Are
1.17
Where
1.16
Activations Density 0.079%