INDEX
    Explanations

    themes related to social structures and cultural elements

    New Auto-Interp
    Negative Logits
     Both
    -0.20
    Both
    -0.19
     beide
    -0.19
     обо
    -0.17
     BOTH
    -0.16
    两人
    -0.16
    537
    -0.16
    _both
    -0.16
    両
    -0.15
    äºĮ人
    -0.14
    POSITIVE LOGITS
     etc
    0.32
     all
    0.31
    etc
    0.28
    çŃī
    0.25
    —all
    0.22
     ëĵ±ìĿĦ
    0.21
     hepsi
    0.21
     altogether
    0.20
    tc
    0.20
     ëĵ±ìĿĺ
    0.20
    Act Density 0.504%

    No Known Activations