INDEX
Explanations
phrases indicating a sequence of events, specifically events that happened just before a certain action or outcome
pronouns, particularly focusing on the repeated mentions of "he," "she," "I," and "they."
New Auto-Interp
Negative Logits
Associated
-0.71
Monteneg
-0.64
Canaver
-0.64
Consortium
-0.64
Electrical
-0.63
Federation
-0.63
Optim
-0.61
understatement
-0.61
sarc
-0.61
Strategy
-0.60
POSITIVE LOGITS
've
1.04
arrived
0.91
started
0.87
began
0.87
're
0.87
hran
0.84
became
0.82
'd
0.82
exited
0.82
arrive
0.82
Activations Density 0.136%