INDEX
Explanations
occurrences of the word "As"
New Auto-Interp
Negative Logits
evidenced
-0.17
jected
-0.16
eat
-0.16
activate
-0.16
ysi
-0.16
tas
-0.15
activated
-0.15
actly
-0.15
ivec
-0.15
asts
-0.15
POSITIVE LOGITS
ylum
0.21
raf
0.21
soon
0.19
coli
0.18
far
0.18
ención
0.18
untos
0.18
pects
0.17
gard
0.17
mode
0.17
Activations Density 0.039%