INDEX
Explanations
transitions indicating a conclusion or result
statements signaling conclusions or summaries
New Auto-Interp
Negative Logits
"},"
-0.62
______
-0.59
ga
-0.58
duction
-0.57
Rumble
-0.56
upgr
-0.56
GER
-0.55
Barb
-0.55
exclusively
-0.55
burg
-0.55
POSITIVE LOGITS
fter
0.83
forth
0.82
pite
0.77
ername
0.76
forward
0.76
aucuses
0.74
ccording
0.73
entimes
0.72
hester
0.72
noon
0.70
Activations Density 0.034%