INDEX
    Explanations

    elements related to relationships and human connections

    New Auto-Interp
    Negative Logits
    atest
    -0.15
    /misc
    -0.15
    -found
    -0.15
    CADE
    -0.14
    ationale
    -0.14
    ungen
    -0.14
    ivet
    -0.14
    818
    -0.13
    ossible
    -0.13
     Cue
    -0.13
    POSITIVE LOGITS
     both
    0.20
     indeed
    0.17
    both
    0.17
     even
    0.17
     både
    0.16
     actually
    0.16
    ë²Į
    0.15
     nejen
    0.15
     Both
    0.15
     actual
    0.14
    Act Density 0.013%

    No Known Activations