INDEX
    Explanations

    names of individuals and significant historical figures

    New Auto-Interp
    Negative Logits
    %)$
    -0.77
    ÉM
    -0.75
    siez
    -0.71
    ViewImports
    -0.70
    omation
    -0.68
    oine
    -0.68
    bode
    -0.68
    eaway
    -0.67
    ."</
    -0.65
    INTEN
    -0.65
    POSITIVE LOGITS
     himself
    0.91
    '
    0.79
    0.68
     Himself
    0.62
    himself
    0.61
    ssohn
    0.59
     Seeder
    0.55
     who
    0.54
     nødven
    0.54
     desnuda
    0.54
    Act Density 0.339%

    No Known Activations