INDEX
Explanations
mentions of specific locations within sentences
the phrase "the" and its various usages across sentences
New Auto-Interp
Negative Logits
itiz
-0.73
ional
-0.72
#$
-0.65
irds
-0.64
abel
-0.61
edly
-0.61
yet
-0.61
owes
-0.61
abe
-0.60
uty
-0.58
POSITIVE LOGITS
meantime
1.69
midst
1.34
absence
1.26
aftermath
1.23
case
1.05
ensuing
1.04
wake
1.01
simplest
0.96
meanwhile
0.96
context
0.96
Activations Density 0.137%