INDEX
Explanations
conjunctions "or" with high activation values
instances of the word "or."
New Auto-Interp
Negative Logits
ires
-0.73
estamp
-0.63
ourn
-0.61
irlf
-0.61
ublic
-0.59
idem
-0.57
ascus
-0.57
Wast
-0.57
irms
-0.57
ights
-0.56
POSITIVE LOGITS
ifice
1.35
Else
1.31
acles
1.27
acle
1.25
acular
1.14
chard
1.10
chid
1.08
nery
1.04
alternatively
1.02
ific
1.00
Activations Density 0.192%