INDEX
    Explanations

    mentions of monkeys

    New Auto-Interp
    Negative Logits
     Lauder
    -0.83
    inen
    -0.82
    reek
    -0.78
    sburgh
    -0.77
    EMP
    -0.75
    oppable
    -0.73
    encer
    -0.71
    encers
    -0.71
    OHN
    -0.70
    aci
    -0.69
    POSITIVE LOGITS
    pox
    0.99
     wrench
    0.90
    sey
    0.87
    patch
    0.82
    ãĥ£
    0.79
    oleon
    0.76
     bitten
    0.73
     monkeys
    0.72
    zee
    0.70
     monkey
    0.70
    Act Density 0.021%

    No Known Activations