INDEX
    Explanations

    names of celebrities

    references to awards or notable achievements

    New Auto-Interp
    Negative Logits
    clair
    -0.69
    interstitial
    -0.68
    dden
    -0.67
     alignment
    -0.66
     lateral
    -0.64
    LOG
    -0.64
     removable
    -0.63
     phyl
    -0.62
     logged
    -0.62
    igmatic
    -0.61
    POSITIVE LOGITS
     Rus
    2.24
     Wins
    2.17
     Gos
    1.83
     Won
    1.49
    Rus
    1.17
     Bis
    1.06
     Kaw
    1.05
     Cars
    1.05
    Wars
    1.00
     Krish
    0.97
    Act Density 0.014%

    No Known Activations