INDEX
    Explanations

    frequent nouns and pronouns, suggesting a focus on identifying relationships between subjects and their actions or characteristics

    New Auto-Interp
    Negative Logits
    tight
    -0.15
    atos
    -0.15
    imeo
    -0.14
     antim
    -0.14
    chos
    -0.14
     innocent
    -0.14
    anno
    -0.14
    нг
    -0.14
     Seymour
    -0.14
    /Foundation
    -0.14
    POSITIVE LOGITS
    ibil
    0.16
    ird
    0.16
    isson
    0.15
    Equality
    0.14
    ilton
    0.14
    aggi
    0.14
     svens
    0.14
    .pix
    0.14
     çĿ
    0.14
    çĿ
    0.14
    Act Density 0.030%

    No Known Activations