INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     utafitiHapana
    -0.83
     with
    -0.81
    etheless
    -0.80
     يتيمه
    -0.80
     ویکی‌پدیا
    -0.77
     although
    -0.77
    ConstraintMaker
    -0.77
    UIControlState
    -0.77
     maybe
    -0.75
     pleaſure
    -0.74
    POSITIVE LOGITS
    '
    0.85
    <bos>
    0.84
    0.75
    &
    0.57
    -
    0.56
    _
    0.54
    #
    0.52
    0.52
    "
    0.50
    &#
    0.50
    Act Density 0.522%

    No Known Activations