INDEX
    Explanations

    phrases indicating addition or inclusion

    New Auto-Interp
    Negative Logits
    -0.73
    persky
    -0.66
    JspWriter
    -0.66
    Demikian
    -0.61
     Hui
    -0.59
    TouchListener
    -0.59
     leaft
    -0.59
    داية
    -0.58
     Photocase
    -0.58
     tört
    -0.58
    POSITIVE LOGITS
     besides
    0.83
    Besides
    0.77
     Besides
    0.76
    besides
    0.69
    enumi
    0.67
     Além
    0.65
    Além
    0.64
     Oltre
    0.62
    Oltre
    0.62
     being
    0.62
    Act Density 0.091%

    No Known Activations