INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    enga
    -0.10
     consort
    -0.10
     ФедеÑĢаÑĨии
    -0.09
    ãģıãĤīãģĦ
    -0.09
    ï¼¶
    -0.09
     Weston
    -0.09
    \tCopyright
    -0.08
    EditingStyle
    -0.08
    avad
    -0.08
     Austral
    -0.08
    POSITIVE LOGITS
     improvement
    0.36
     Improvement
    0.27
     improvements
    0.21
     improved
    0.20
     improve
    0.19
     growth
    0.17
    Impro
    0.17
     improving
    0.16
     miglior
    0.16
    æĶ¹
    0.15
    Act Density 0.023%

    No Known Activations