INDEX
    Explanations

    occurrences of the word "As"

    New Auto-Interp
    Negative Logits
     evidenced
    -0.17
    jected
    -0.16
    eat
    -0.16
    activate
    -0.16
    ysi
    -0.16
    tas
    -0.15
    activated
    -0.15
    actly
    -0.15
    ivec
    -0.15
    asts
    -0.15
    POSITIVE LOGITS
    ylum
    0.21
    raf
    0.21
     soon
    0.19
    coli
    0.18
     far
    0.18
    ención
    0.18
    untos
    0.18
    pects
    0.17
    gard
    0.17
    mode
    0.17
    Act Density 0.039%

    No Known Activations