INDEX
    Explanations

    the word 'demon' or variations of it

    New Auto-Interp
    Negative Logits
     Mellon
    -0.67
    artney
    -0.64
     Bacon
    -0.64
    abouts
    -0.63
     Howe
    -0.62
    arkin
    -0.62
     Norn
    -0.61
    arella
    -0.61
    ippi
    -0.61
     Morales
    -0.60
    POSITIVE LOGITS
    stration
    1.74
    strate
    1.47
    str
    1.25
    ization
    1.11
    iac
    1.11
    izing
    1.04
    ica
    1.00
    izations
    1.00
    isation
    0.99
    ising
    0.97
    Act Density 0.017%

    No Known Activations