INDEX
Explanations
phrases indicating probabilities or likelihoods of events
New Auto-Interp
Negative Logits
allegedly
-0.23
supposedly
-0.22
possibly
-0.18
presumably
-0.17
apparently
-0.17
Apparently
-0.16
Possibly
-0.16
arguably
-0.16
reportedly
-0.16
.au
-0.16
POSITIVE LOGITS
hood
0.41
hood
0.28
gonna
0.27
going
0.26
going
0.26
candidates
0.26
be
0.24
scenario
0.22
ly
0.21
scenarios
0.21
Activations Density 0.044%