INDEX
Explanations
questions starting with "What if"
instances of hypothetical scenarios or conditional statements
New Auto-Interp
Negative Logits
Eye
-0.67
abre
-0.66
ila
-0.64
================================================================
-0.64
hands
-0.63
akia
-0.63
gae
-0.62
horse
-0.62
arm
-0.61
depths
-0.60
POSITIVE LOGITS
someday
0.84
you
0.80
fy
0.75
they
0.74
somebody
0.73
someone
0.72
?]
0.70
we
0.69
Gutenberg
0.68
hypot
0.67
Activations Density 0.030%