INDEX
    Explanations

    negative or problematic descriptors in various contexts

    New Auto-Interp
    Negative Logits
    RITE
    -0.17
    owel
    -0.16
    çĽ
    -0.15
    ikut
    -0.15
    à¸ŀà¸Ń
    -0.14
    rani
    -0.14
    ê¹ĮìļĶ
    -0.14
    ottle
    -0.14
     somewhat
    -0.14
    rance
    -0.13
    POSITIVE LOGITS
     nor
    0.40
     anymore
    0.29
    nor
    0.28
     Nor
    0.24
    Nor
    0.23
     neither
    0.20
     anywhere
    0.20
     NOR
    0.19
     sondern
    0.19
     ani
    0.18
    Act Density 0.484%

    No Known Activations