INDEX
Explanations
specific words related to informative text, signaling transitions or new parts of the text
the word "While."
New Auto-Interp
Negative Logits
illet
-0.80
atron
-0.72
bled
-0.72
ade
-0.71
tnc
-0.70
iola
-0.70
rium
-0.70
isable
-0.69
enter
-0.69
elled
-0.69
POSITIVE LOGITS
acknowledging
1.19
researching
0.99
conced
0.94
browsing
0.94
discussing
0.91
agreeing
0.86
mentioning
0.84
dismissing
0.83
admitting
0.83
respecting
0.82
Activations Density 0.048%