INDEX
    Explanations

    names, especially the name "Joseph"

    New Auto-Interp
    Negative Logits
    APD
    -0.72
     GOODMAN
    -0.71
     "$:/
    -0.70
    hips
    -0.69
    atron
    -0.68
    ricted
    -0.65
    idad
    -0.63
    raints
    -0.63
     warr
    -0.62
    ADRA
    -0.60
    POSITIVE LOGITS
    smanship
    1.04
    uth
    0.80
    hawks
    0.77
    fires
    0.77
    fters
    0.76
    bard
    0.72
    tein
    0.71
    fruit
    0.71
    fire
    0.70
    ry
    0.70
    Act Density 1.549%

    No Known Activations