INDEX
Explanations
mentions of specific locations or settings
occurrences of the word "the" in various contexts
New Auto-Interp
Negative Logits
itiz
-0.70
emate
-0.66
ulence
-0.66
pointers
-0.64
witch
-0.63
anism
-0.63
antes
-0.62
pers
-0.60
ional
-0.60
irds
-0.60
POSITIVE LOGITS
meantime
1.45
midst
1.24
aftermath
1.16
absence
1.06
guise
1.04
simplest
1.04
context
1.02
same
0.97
ensuing
0.95
case
0.95
Activations Density 0.148%