INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     philosoph
    -0.08
    族自治
    -0.08
    parency
    -0.08
    accuracy
    -0.07
    roe
    -0.07
    roids
    -0.07
     Berger
    -0.07
     TOT
    -0.07
     Blueprint
    -0.07
     Azer
    -0.07
    POSITIVE LOGITS
     utensils
    0.09
     handig
    0.09
     handy
    0.09
     כדאי
    0.08
     spark
    0.08
     حف
    0.08
     зүйл
    0.08
     понадобится
    0.08
     तयारी
    0.08
     equipments
    0.08
    Act Density 0.010%

    No Known Activations