INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     implication
    -0.06
     restraining
    -0.06
     millennium
    -0.06
     boy
    -0.06
    *=
    -0.06
    _principal
    -0.06
    ekten
    -0.06
    fixed
    -0.06
    	expect
    -0.06
    場合
    -0.06
    POSITIVE LOGITS
    unday
    0.07
    avatel
    0.07
    ΗΜ
    0.07
    UTTON
    0.06
    IPP
    0.06
    igor
    0.06
     gfx
    0.06
     jersey
    0.06
    (ERROR
    0.06
    _filename
    0.06
    Act Density 0.172%

    No Known Activations