INDEX
    Explanations

    the word "pit" followed by a high activation number

    repeated mentions of "pit bull."

    New Auto-Interp
    Negative Logits
     ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
    -0.71
     Carbuncle
    -0.69
     Lauder
    -0.66
     challeng
    -0.65
     proport
    -0.64
    IGH
    -0.63
     Feinstein
    -0.62
    lihood
    -0.61
     Polo
    -0.60
     Leilan
    -0.59
    POSITIVE LOGITS
    iful
    1.35
    cair
    1.28
    ifully
    1.27
    iless
    1.24
    cher
    1.08
    bull
    0.94
    uit
    0.90
    iable
    0.88
    adium
    0.87
    reatment
    0.86
    Act Density 0.039%

    No Known Activations