INDEX
Explanations
instances of the word "when" indicating temporal references
New Auto-Interp
Negative Logits
agin
-0.73
Die
-0.70
omal
-0.70
urden
-0.69
icker
-0.68
lation
-0.66
andal
-0.66
ictive
-0.65
odge
-0.63
abre
-0.63
POSITIVE LOGITS
discussing
0.89
asked
0.83
he
0.81
faced
0.78
confronted
0.77
researching
0.77
introducing
0.75
they
0.74
encountering
0.74
pressed
0.72
Activations Density 0.065%