INDEX
Explanations
phrases that introduce content or signals a new topic
instances of the phrase "Like this:"
New Auto-Interp
Negative Logits
uve
-0.69
wreck
-0.58
offending
-0.56
harness
-0.56
interpol
-0.55
alist
-0.54
ment
-0.53
mble
-0.52
verning
-0.51
unloaded
-0.51
POSITIVE LOGITS
Cosponsors
0.73
0.62
{*0.62
pees
0.61
Choose
0.60
Provided
0.58
Click
0.57
hover
0.57
Click
0.57
iPhone
0.56
Activations Density 0.052%