INDEX
    Explanations

    words associated with authority and control

    New Auto-Interp
    Negative Logits
    浩
    -0.16
     Kob
    -0.15
     spare
    -0.15
    amate
    -0.15
    unes
    -0.15
    /archive
    -0.14
     spou
    -0.14
    amet
    -0.14
     PyErr
    -0.14
    ugin
    -0.14
    POSITIVE LOGITS
    imits
    0.16
    áli
    0.15
    vit
    0.15
    Descricao
    0.15
    bart
    0.14
    nds
    0.14
    midi
    0.14
    GRE
    0.14
     Avg
    0.14
     Tanrı
    0.14
    Act Density 0.001%

    No Known Activations