INDEX
Explanations
mentions of a specific entity or topic within a longer discussion context
instances of the word "the"
New Auto-Interp
Negative Logits
itiz
-0.77
ional
-0.72
irds
-0.68
abe
-0.64
Topics
-0.62
pointers
-0.62
owes
-0.62
#$
-0.62
malf
-0.61
uty
-0.61
POSITIVE LOGITS
meantime
1.66
midst
1.37
aftermath
1.26
absence
1.23
context
1.07
simplest
1.05
guise
1.04
ensuing
1.01
wake
1.00
meanwhile
0.98
Activations Density 0.167%