INDEX
    Explanations

    take action, care, control, permission

    New Auto-Interp
    Negative Logits
     ईसाई
    0.40
    0.40
    0.39
    0.39
     subsidi
    0.38
     Toni
    0.38
     参数
    0.38
    0.38
    0.37
    isp
    0.36
    POSITIVE LOGITS
     assistance
    0.51
     help
    0.50
     lap
    0.45
    取る
    0.45
     permission
    0.45
    取り
    0.44
    resh
    0.44
     forward
    0.42
    Help
    0.42
    0.41
    Act Density 0.002%

    No Known Activations