INDEX
    Explanations

    concepts related to philosophical discussions and critiques

    New Auto-Interp
    Negative Logits
    uer
    -0.18
    atus
    -0.16
    argo
    -0.15
    uur
    -0.15
    asso
    -0.15
    ogle
    -0.14
    rock
    -0.14
     rub
    -0.14
    ron
    -0.14
    ally
    -0.14
    POSITIVE LOGITS
    uarios
    0.16
    ashtra
    0.15
    monds
    0.15
    outers
    0.15
    .Apis
    0.15
    anford
    0.15
    éϤ
    0.15
     ê¶Į
    0.14
    offsetof
    0.14
    nez
    0.14
    Act Density 0.274%

    No Known Activations