INDEX
    Explanations

    the word "normal" or related terms

    New Auto-Interp
    Negative Logits
    artisan
    -0.77
    hani
    -0.70
    better
    -0.67
    wark
    -0.67
    iosyncr
    -0.63
    raped
    -0.63
     Goff
    -0.63
    REL
    -0.63
    intel
    -0.63
    Sov
    -0.62
    POSITIVE LOGITS
    cy
    1.50
    ization
    1.38
    izing
    1.37
    izes
    1.33
    ised
    1.28
    isation
    1.28
    ize
    1.24
    ized
    1.20
    izer
    1.14
    izers
    1.13
    Act Density 0.034%

    No Known Activations