INDEX
    Explanations

    gendered terms and specific article usage in context

    New Auto-Interp
    Negative Logits
    RegressionTest
    -0.57
     mattino
    -0.57
    -0.57
    ії
    -0.55
     समीक्षाओं
    -0.54
     avancé
    -0.54
     mourut
    -0.54
     onAnimation
    -0.54
     Denna
    -0.53
     Затем
    -0.53
    POSITIVE LOGITS
    Das
    0.86
     Das
    0.83
    ное
    0.79
    ું
    0.79
     das
    0.77
    Το
    0.75
    льное
    0.75
    ческое
    0.74
    Het
    0.74
    noe
    0.72
    Act Density 0.117%

    No Known Activations