INDEX
    Explanations

    relationships and interactions between different subjects or characters

    New Auto-Interp
    Negative Logits
    èm
    -0.15
    afil
    -0.15
    uppe
    -0.14
    olon
    -0.14
    ronics
    -0.14
    ksen
    -0.14
    oltip
    -0.14
    ftar
    -0.13
    prung
    -0.13
    arias
    -0.13
    POSITIVE LOGITS
     both
    0.49
     Both
    0.46
    both
    0.46
    Both
    0.45
     ambos
    0.43
    两人
    0.43
    _both
    0.43
     beide
    0.42
     mutual
    0.42
     BOTH
    0.41
    Act Density 0.555%

    No Known Activations