INDEX
    Explanations

    mentions of the word 'cat'

    New Auto-Interp
    Negative Logits
     htt
    -0.71
    gur
    -0.67
     Vander
    -0.65
     Glas
    -0.64
     Sachs
    -0.63
     Vaugh
    -0.63
    assetsadobe
    -0.63
     Fellowship
    -0.63
     rece
    -0.62
    demand
    -0.62
    POSITIVE LOGITS
    aclysm
    1.40
    alogue
    1.26
    alog
    1.16
    apult
    1.15
    cat
    1.10
    hedral
    1.08
    heter
    1.05
    alyst
    1.05
    cher
    1.00
    chers
    0.96
    Act Density 0.013%

    No Known Activations