INDEX
    Explanations

    the name "Harold" at varying levels of activation

    instances of the name "Harold."

    New Auto-Interp
    Negative Logits
    eanor
    -0.89
    hetically
    -0.83
    psey
    -0.81
    igger
    -0.81
    insula
    -0.80
    ongs
    -0.77
    agogue
    -0.74
    oing
    -0.73
    arnaev
    -0.73
    ocrats
    -0.73
    POSITIVE LOGITS
     Harold
    0.86
     Lank
    0.82
     Kut
    0.80
     McGee
    0.79
     Vaj
    0.77
     Rupert
    0.76
     Melvin
    0.74
    balls
    0.70
     Weinstein
    0.70
     Cald
    0.67
    Act Density 0.020%

    No Known Activations