INDEX
Explanations
questions or phrases asking for definitions or explanations
the phrase "What is" followed by a question or inquiry
New Auto-Interp
Negative Logits
iard
-0.78
lems
-0.76
cember
-0.68
lator
-0.67
legram
-0.67
owan
-0.66
lette
-0.65
gem
-0.64
lia
-0.64
lication
-0.64
POSITIVE LOGITS
happening
1.10
wrong
0.85
YOUR
0.84
your
0.83
meant
0.81
occurring
0.76
going
0.74
Wrong
0.71
stopping
0.70
causing
0.70
Activations Density 0.050%