INDEX
    Explanations

    instances of social norms and practices

    New Auto-Interp
    Negative Logits
    lÃŃ
    -0.17
    owitz
    -0.16
    adb
    -0.16
     QuáºŃn
    -0.14
     Arbitrary
    -0.14
    оже
    -0.14
    Zero
    -0.14
    kJ
    -0.14
    еÑĢк
    -0.13
    ication
    -0.13
    POSITIVE LOGITS
     even
    0.24
     almost
    0.20
    even
    0.19
     sometimes
    0.17
     даже
    0.17
     sogar
    0.17
    almost
    0.17
    pais
    0.16
     actually
    0.15
     EVEN
    0.15
    Act Density 0.054%

    No Known Activations