INDEX
    Explanations

    phrases indicating denial or refusals

    New Auto-Interp
    Negative Logits
    iasi
    -0.15
    ÌĨ
    -0.15
    ddy
    -0.15
    ذ
    -0.14
    :CGRect
    -0.14
    utherford
    -0.14
    ibbon
    -0.13
    andro
    -0.13
    isible
    -0.13
    CTest
    -0.13
    POSITIVE LOGITS
     need
    0.54
     must
    0.53
     needs
    0.50
    å¿ħé¡»
    0.48
    need
    0.47
     gotta
    0.47
    must
    0.46
     phải
    0.44
     Must
    0.44
    needs
    0.44
    Act Density 0.559%

    No Known Activations