INDEX
Explanations
phrases signaling comparison or contrast
references to the concept of "which" as it pertains to explanations or clarifications in the text
New Auto-Interp
Negative Logits
Behind
-0.76
grim
-0.73
let
-0.68
rior
-0.68
Roaming
-0.62
bug
-0.62
Ott
-0.62
hat
-0.60
da
-0.60
lean
-0.60
POSITIVE LOGITS
soever
0.90
consisted
0.75
akespeare
0.75
corresponds
0.74
originated
0.73
consists
0.73
exceeds
0.73
lasted
0.71
constitutes
0.71
resulted
0.71
Activations Density 0.035%