INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ignment
    -0.76
    ition
    -0.72
    heed
    -0.69
    mith
    -0.69
    ulet
    -0.68
    rooms
    -0.65
    =-=-=-=-
    -0.64
    hur
    -0.64
    itol
    -0.63
    child
    -0.62
    POSITIVE LOGITS
     Hoover
    0.89
    ALLY
    0.87
    swick
    0.79
     Cheong
    0.78
     Draper
    0.77
     Ernest
    0.75
    ally
    0.73
     Hem
    0.72
     Byrne
    0.72
    ATIONS
    0.71
    Act Density 0.047%

    No Known Activations