INDEX
    Explanations

    metrics and scores related to evaluation criteria

    New Auto-Interp
    Negative Logits
    DeleteBehavior
    -0.55
     excru
    -0.52
    expandindo
    -0.52
     disponibilités
    -0.51
    idavit
    -0.49
    )_/¯
    -0.49
     miniatur
    -0.49
     ujednoznacz
    -0.47
     Baillargeon
    -0.45
    -0.45
    POSITIVE LOGITS
     score
    2.53
     scores
    2.26
    score
    2.12
     scoring
    2.10
     Score
    2.08
     scored
    2.03
     rating
    2.01
    Score
    1.93
     Scores
    1.92
     Scoring
    1.81
    Act Density 0.548%

    No Known Activations