INDEX
    Explanations

    references to individuals and their actions or feelings, indicating a focus on characters and their interactions

    New Auto-Interp
    Negative Logits
    ly
    -0.16
    bsp
    -0.15
     Clo
    -0.15
    iously
    -0.14
    enz
    -0.14
    öy
    -0.14
     comm
    -0.14
    ely
    -0.14
    aho
    -0.14
    arily
    -0.13
    POSITIVE LOGITS
    oret
    0.18
     ведÑĮ
    0.16
     certainly
    0.16
    orem
    0.15
    inerary
    0.15
    alth
    0.14
    oretical
    0.14
    iag
    0.14
    zel
    0.14
    intros
    0.14
    Act Density 0.563%

    No Known Activations