INDEX
Explanations
phrases expressing causation or explanation
phrases introducing a clause or additional information
New Auto-Interp
Negative Logits
athi
-0.67
Bas
-0.65
Behind
-0.64
Rog
-0.61
STE
-0.60
Burg
-0.58
BLE
-0.57
EMBER
-0.57
Crash
-0.57
Hutch
-0.57
POSITIVE LOGITS
resulted
0.88
admittedly
0.87
allows
0.84
prompts
0.82
presumably
0.82
brings
0.81
fortunately
0.78
thankfully
0.77
incidentally
0.77
milo
0.75
Activations Density 0.132%