INDEX
    Explanations

    expressions of confusion or uncertainty related to new experiences or learning

    New Auto-Interp
    Negative Logits
    atori
    -0.15
    à¸ģรรม
    -0.15
    aked
    -0.14
    ẩm
    -0.14
    auty
    -0.14
    idlo
    -0.14
     span
    -0.14
    XT
    -0.13
    isci
    -0.13
    135
    -0.13
    POSITIVE LOGITS
    IPA
    0.17
    uggage
    0.16
    >NN
    0.16
    celik
    0.15
    Slave
    0.14
    $LANG
    0.14
    idas
    0.14
    ä¹ł
    0.14
    rias
    0.14
    _NR
    0.14
    Act Density 0.121%

    No Known Activations