INDEX
    Explanations

    references to traditional customs or cultural practices

    New Auto-Interp
    Negative Logits
    /we
    -0.17
    bras
    -0.17
    thing
    -0.17
    /he
    -0.15
    asons
    -0.14
    oll
    -0.14
    lict
    -0.14
    íıī
    -0.14
    ages
    -0.13
    rowse
    -0.13
    POSITIVE LOGITS
    ists
    0.25
    /current
    0.20
    ively
    0.20
    /original
    0.20
    ist
    0.19
    ised
    0.19
    ized
    0.19
    mente
    0.18
    itionally
    0.18
    ism
    0.18
    Act Density 0.028%

    No Known Activations