INDEX
Explanations
the repetition of the phrase "As" in varying contexts
New Auto-Interp
Negative Logits
activate
-0.17
evidenced
-0.17
eat
-0.16
çµ
-0.16
ever
-0.16
activated
-0.16
ean
-0.15
activ
-0.15
ey
-0.15
erap
-0.15
POSITIVE LOGITS
ylum
0.23
untos
0.20
raf
0.20
coli
0.20
mode
0.18
soon
0.18
gard
0.18
pects
0.18
bestos
0.17
far
0.17
Activations Density 0.040%