INDEX
    Explanations

    phrases indicating causation or dependency

    New Auto-Interp
    Negative Logits
    ini
    -0.16
    ä¸ģ缮
    -0.16
    tiny
    -0.14
    ordan
    -0.14
    iero
    -0.14
     uniformly
    -0.13
    alan
    -0.13
    idi
    -0.13
    vides
    -0.13
    Å
    -0.13
    POSITIVE LOGITS
     partially
    0.67
     partly
    0.62
     partial
    0.53
     Partial
    0.53
    partial
    0.48
    Partial
    0.43
    .partial
    0.39
    _partial
    0.37
     جزئ
    0.36
     ÑĩаÑģÑĤ
    0.35
    Act Density 0.159%

    No Known Activations