INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     OFDb
    -0.75
    audiovisuel
    -0.67
    ThroughAttribute
    -0.65
     PLWABN
    -0.63
    Jako
    -0.62
     ]}
    -0.61
    κης
    -0.60
    tanleria
    -0.60
     ͡°)
    -0.59
     ویکی‌پدی
    -0.59
    POSITIVE LOGITS
     win
    0.84
    win
    0.76
    Win
    0.75
     Win
    0.75
     Winn
    0.73
     WIN
    0.72
     Winona
    0.70
     Unwin
    0.70
     battles
    0.68
     Winning
    0.68
    Act Density 0.090%

    No Known Activations