INDEX
    Explanations

    phrases that express observation or inquiry

    New Auto-Interp
    Negative Logits
    ä¼ij
    -0.16
    agas
    -0.15
    onet
    -0.15
    ãģĦãĤĭ
    -0.14
    itches
    -0.14
    corp
    -0.14
    seed
    -0.14
    freeze
    -0.14
    YSTEM
    -0.13
    irk
    -0.13
    POSITIVE LOGITS
     if
    0.35
     whether
    0.25
    	if
    0.23
     how
    0.20
     what
    0.20
     nếu
    0.18
     еÑģли
    0.17
     about
    0.17
     wenn
    0.17
    _if
    0.17
    Act Density 0.023%

    No Known Activations