INDEX
Explanations
phrases indicating conditionality or potentiality
New Auto-Interp
Negative Logits
maybe
-0.16
appable
-0.16
amba
-0.15
atat
-0.15
perhaps
-0.15
å°ĭ
-0.15
undy
-0.14
possible
-0.14
\<^
-0.14
possible
-0.14
POSITIVE LOGITS
well
0.36
well
0.34
Might
0.33
might
0.31
might
0.31
Well
0.29
WELL
0.27
Well
0.26
wells
0.24
wel
0.23
Activations Density 0.038%