INDEX
    Explanations

    references to social interactions and dining settings

    New Auto-Interp
    Negative Logits
    n
    -0.31
    sa
    -0.31
    ngths
    -0.29
    uib
    -0.29
    ko
    -0.28
    ко
    -0.28
    ad
    -0.28
     Tübingen
    -0.28
    -0.28
    ↵↵
    -0.28
    POSITIVE LOGITS
    iſchen
    0.68
     صوتيه
    0.63
    ſicht
    0.63
    ſſung
    0.62
    iſche
    0.61
    <unused41>
    0.61
    <unused23>
    0.61
    <unused28>
    0.61
    <unused14>
    0.61
    <unused8>
    0.61
    Act Density 0.142%

    No Known Activations