INDEX
    Explanations

    names of celebrities, prominent figures or athletes

    New Auto-Interp
    Negative Logits
    awks
    -0.68
    ories
    -0.66
    uate
    -0.66
    iliate
    -0.63
    Els
    -0.63
    raq
    -0.62
    sequence
    -0.62
    Load
    -0.61
    rals
    -0.60
    olulu
    -0.60
    POSITIVE LOGITS
     Sr
    1.35
     Jr
    1.31
     III
    1.05
     aka
    0.94
    ovich
    0.91
    Jr
    0.87
     Productions
    0.86
     Returns
    0.86
     greets
    0.83
     IV
    0.82
    Act Density 0.292%

    No Known Activations