INDEX
Explanations
multiple occurrences of the word "The" indicating a focus on article usage
New Auto-Interp
Negative Logits
nt
-0.17
ovic
-0.17
ther
-0.16
sis
-0.15
mer
-0.15
land
-0.14
sm
-0.13
atan
-0.13
)>↵
-0.13
erca
-0.13
POSITIVE LOGITS
orem
0.24
oret
0.24
ories
0.23
odore
0.22
oretical
0.21
odor
0.21
issen
0.17
eview
0.17
TRL
0.15
aim
0.15
Activations Density 0.478%