INDEX
Explanations
phrases indicating change or transition
New Auto-Interp
Negative Logits
erville
-0.08
now
-0.07
ummies
-0.07
thus
-0.07
therefore
-0.07
azor
-0.07
nett
-0.07
nel
-0.06
soon
-0.06
now
-0.06
POSITIVE LOGITS
icz
0.07
Tribune
0.06
eme
0.06
ank
0.06
ibal
0.06
PTION
0.06
Tender
0.06
بÛĮر
0.06
arian
0.06
_VARIABLE
0.06
Activations Density 0.007%