INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.07
    ague
    -0.07
    _err
    -0.07
    immune
    -0.06
     minute
    -0.06
     strict
    -0.06
     questionnaire
    -0.06
     discharged
    -0.06
     erroneous
    -0.06
    しか
    -0.06
    POSITIVE LOGITS
     top
    0.20
     Top
    0.18
    top
    0.18
    Top
    0.17
     TOP
    0.16
    _top
    0.13
    (top
    0.13
    TOP
    0.13
    	top
    0.13
    .top
    0.13
    Act Density 0.031%

    No Known Activations