INDEX
Explanations
the word "nights" with strong activations
occurrences of the word "nights."
New Auto-Interp
Negative Logits
bloc
-0.72
Swiss
-0.66
lda
-0.66
ression
-0.65
ific
-0.65
offic
-0.63
Canary
-0.63
BN
-0.61
eering
-0.61
Democrat
-0.61
POSITIVE LOGITS
creen
1.35
mith
1.25
pring
1.12
hift
1.08
uits
1.06
cape
1.04
poons
1.03
hips
1.01
hops
1.01
cale
1.01
Activations Density 0.031%