INDEX
    Explanations

    important nouns and their relationships to actions and interests

    New Auto-Interp
    Negative Logits
     bro
    -0.15
    رس
    -0.15
     Kod
    -0.14
    prom
    -0.14
    ibling
    -0.14
    rone
    -0.14
    èm
    -0.14
    pal
    -0.13
    plier
    -0.13
    avel
    -0.13
    POSITIVE LOGITS
    swith
    0.17
    oints
    0.15
    [s
    0.15
    lessly
    0.15
    ennis
    0.14
    ws
    0.14
    inous
    0.14
    klad
    0.14
    storybook
    0.14
    ssf
    0.14
    Act Density 0.651%

    No Known Activations