INDEX
Explanations
phrases introducing a quote or piece of information
phrases starting with "Here’s" followed by additional context or information
New Auto-Interp
Negative Logits
aneers
-0.70
ibles
-0.70
ange
-0.69
cair
-0.68
hate
-0.67
headaches
-0.63
esm
-0.62
aceutical
-0.61
iosity
-0.61
extermination
-0.60
POSITIVE LOGITS
hoping
0.82
illustrate
0.71
excerpt
0.65
example
0.65
highlights
0.64
why
0.63
yours
0.63
Demo
0.63
snippet
0.62
courtesy
0.61
Activations Density 0.056%