INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ));↵↵
    -0.07
    xffff
    -0.07
     dims
    -0.07
    Datas
    -0.07
     drill
    -0.06
    錯誤
    -0.06
    -bedroom
    -0.06
    ';
    -0.06
    _topics
    -0.06
    -0.06
    POSITIVE LOGITS
     exponential
    0.07
     linguistic
    0.07
    stant
    0.07
    0.07
    0.07
     interpreted
    0.07
    sto
    0.07
     ela
    0.07
     перевод
    0.07
    	not
    0.07
    Act Density 0.005%

    No Known Activations