INDEX
Explanations
questions that start with "What is."
questions or interrogative phrases
New Auto-Interp
Negative Logits
manif
-0.71
marsh
-0.69
binge
-0.68
encount
-0.66
normalized
-0.66
healed
-0.66
mutually
-0.66
labor
-0.66
colon
-0.66
slam
-0.66
POSITIVE LOGITS
[/
1.00
Improve
0.95
Nope
0.95
¶
0.94
Where
0.93
Anyone
0.91
Why
0.88
Yourself
0.88
Seriously
0.86
Answer
0.86
Activations Density 0.099%