INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     cereals
    -0.09
    igar
    -0.08
    libs
    -0.08
    lang
    -0.08
    Responder
    -0.08
    авяз
    -0.08
    langs
    -0.08
    به
    -0.08
    _classifier
    -0.08
     classifier
    -0.07
    POSITIVE LOGITS
     accuracy
    0.16
     inaccuracies
    0.15
     Accuracy
    0.14
     inaccurate
    0.14
    Accuracy
    0.14
    准确
    0.13
    可靠
    0.13
    accuracy
    0.13
     reliability
    0.13
     Accurate
    0.13
    Act Density 0.035%

    No Known Activations