INDEX
    Explanations

    names, specifically focusing on names ending in 'las' and 'Andreas'

    New Auto-Interp
    Negative Logits
    riter
    -0.83
    lance
    -0.81
    wards
    -0.78
    isite
    -0.77
    ished
    -0.76
    fare
    -0.75
    fecture
    -0.75
    ewitness
    -0.74
    lled
    -0.74
    lished
    -0.73
    POSITIVE LOGITS
    henko
    0.96
    andro
    0.91
    lav
    0.89
    andr
    0.88
    '
    0.87
    Magikarp
    0.87
     Kats
    0.85
     Maduro
    0.84
     Jere
    0.84
     Anton
    0.84
    Act Density 0.081%

    No Known Activations