INDEX
Explanations
verbs indicating significance, impact, or consequence
phrases that denote implications or meanings
New Auto-Interp
Negative Logits
iliate
-0.69
phrine
-0.67
oute
-0.64
Kings
-0.64
genre
-0.63
iban
-0.63
taboola
-0.63
ibble
-0.63
aughs
-0.63
@#&
-0.62
POSITIVE LOGITS
goodbye
1.02
sacrificing
0.72
nothing
0.68
hift
0.68
fewer
0.66
something
0.66
shovel
0.64
spirited
0.64
risking
0.62
everything
0.62
Activations Density 0.038%