INDEX
    Explanations

    performance

    New Auto-Interp
    Negative Logits
     Agent
    -0.06
     skins
    -0.06
     dos
    -0.06
    DoubleClick
    -0.06
     bite
    -0.06
     Gujarat
    -0.05
     Indo
    -0.05
     SNP
    -0.05
     Kraft
    -0.05
    _bt
    -0.05
    POSITIVE LOGITS
     performance
    0.13
     Performance
    0.10
    Performance
    0.09
    performance
    0.08
    -performance
    0.08
     performances
    0.07
    效果
    0.07
    性能
    0.07
    	↵	↵↵
    0.07
    	labels
    0.07
    Act Density 0.019%

    No Known Activations