INDEX
    Explanations

    prominent names and figures in various contexts

    New Auto-Interp
    Negative Logits
    ses
    -0.21
    ence
    -0.20
    sing
    -0.19
    olly
    -0.17
    /fw
    -0.16
    ned
    -0.16
    olley
    -0.16
    oga
    -0.15
    unds
    -0.15
    oldem
    -0.15
    POSITIVE LOGITS
    arin
    0.17
    ito
    0.17
    pace
    0.17
    igans
    0.16
    pawn
    0.16
    laus
    0.15
    ãĥ¥
    0.15
    asaki
    0.15
    itos
    0.14
    iana
    0.14
    Act Density 0.696%

    No Known Activations