INDEX
Explanations
phrases signaling a transition or change in topic
phrases that introduce speculation or uncertainty
New Auto-Interp
Negative Logits
akes
-0.81
ocaust
-0.77
iphate
-0.77
ship
-0.77
iak
-0.72
mons
-0.71
ieves
-0.71
endar
-0.70
atches
-0.70
efer
-0.69
POSITIVE LOGITS
unsurprisingly
1.45
sensing
1.14
unsur
1.06
surprisingly
1.03
unwittingly
0.97
predictably
0.97
best
0.96
ironically
0.93
understandably
0.92
unfairly
0.88
Activations Density 0.057%