INDEX
Explanations
phrases indicating speculation or uncertainty about future events
New Auto-Interp
Negative Logits
AndWait
-0.15
Fucking
-0.15
finally
-0.15
ampp
-0.15
fucking
-0.14
icerca
-0.14
utom
-0.14
Fuck
-0.14
Fuck
-0.14
éĻ
-0.14
POSITIVE LOGITS
rather
0.17
somewhat
0.16
quite
0.16
rather
0.16
wohl
0.16
likely
0.15
plug
0.15
Likely
0.14
088
0.14
ä¹ĭä¸Ģ
0.14
Activations Density 0.073%