INDEX
    Explanations

    phrases and comparisons that highlight differences in treatment or situations

    New Auto-Interp
    Negative Logits
    WriteBarrier
    -0.95
    :✨
    -0.89
     صوتيه
    -0.87
    RegressionTest
    -0.85
    Datuak
    -0.84
    CodedInputStream
    -0.79
     متعلقه
    -0.78
    DrawerToggle
    -0.77
     Wiktionnaire
    -0.76
     []).
    -0.76
    POSITIVE LOGITS
    compare
    0.64
     compare
    0.64
    compar
    0.63
     compared
    0.62
     compar
    0.58
    Compare
    0.55
    compared
    0.54
     comparison
    0.54
     compares
    0.54
     Comparison
    0.53
    Act Density 0.215%

    No Known Activations