INDEX
    Explanations

    numerals and specific words in various languages or scripts

    New Auto-Interp
    Negative Logits
    ه
    -0.81
    م
    -0.67
    у
    -0.66
    ي
    -0.65
    ο
    -0.63
     själva
    -0.62
    \{\\
    -0.62
    ر
    -0.62
    е
    -0.58
    י
    -0.56
    POSITIVE LOGITS
    featureID
    0.54
    e
    0.51
    وفاته
    0.48
    يكب
    0.46
    B
    0.43
     NPs
    0.42
    IndentedString
    0.41
     characterised
    0.41
    J
    0.40
    moga
    0.40
    Act Density 0.019%

    No Known Activations