INDEX
Explanations
phrases related to contrasting or adding on to information
phrases indicating rules or regulations
New Auto-Interp
Negative Logits
Mechdragon
-0.87
Dian
-0.68
IOR
-0.65
Winged
-0.63
Thirty
-0.62
blunt
-0.60
Kund
-0.60
Enlarge
-0.60
Rwanda
-0.59
ANK
-0.58
POSITIVE LOGITS
ought
0.93
should
0.89
nery
0.89
wont
0.87
SHOULD
0.84
shouldn
0.78
thens
0.77
still
0.76
ought
0.76
will
0.75
Activations Density 0.085%