INDEX
    Explanations

    references to categories of societal structures and influential figures

    New Auto-Interp
    Negative Logits
    serter
    -0.15
    unused
    -0.15
    ãĥ¼ãĥľ
    -0.15
    ohen
    -0.15
    á»ĭp
    -0.15
    RuleContext
    -0.14
    óa
    -0.14
    боÑĤ
    -0.14
    اØŃÙĦ
    -0.14
    olib
    -0.14
    POSITIVE LOGITS
     as
    0.35
     как
    0.20
     quanto
    0.19
    ãģªãĤī
    0.18
     als
    0.17
     than
    0.17
     sebagai
    0.17
    ong
    0.16
     että
    0.16
     ÙĥÙħا
    0.14
    Act Density 0.055%

    No Known Activations