INDEX
Explanations
phrases starting with "Here's what..." and similar variations
phrases that introduce information or clarification
New Auto-Interp
Negative Logits
ename
-0.78
oit
-0.69
aukee
-0.65
ways
-0.62
rone
-0.60
idable
-0.59
lust
-0.59
drowning
-0.58
nels
-0.57
oise
-0.57
POSITIVE LOGITS
happened
1.13
happens
1.13
transpired
0.93
else
0.90
happ
0.89
went
0.75
ensued
0.75
distinguishes
0.74
you
0.73
we
0.71
Activations Density 0.085%