INDEX
Explanations
instances of phrases indicating novelty or uniqueness
phrases indicating novelty or unprecedented experiences
New Auto-Interp
Negative Logits
adra
-0.62
tomorrow
-0.61
perpetual
-0.56
imming
-0.54
ogan
-0.53
redients
-0.52
subset
-0.52
idi
-0.51
Tues
-0.51
endless
-0.51
POSITIVE LOGITS
before
1.35
before
1.17
nor
1.14
BEFORE
1.10
except
0.98
Before
0.94
EVER
0.93
anywhere
0.90
beforehand
0.88
unless
0.86
Activations Density 0.321%