INDEX
    Explanations

    references to citations and academic validation

    New Auto-Interp
    Negative Logits
     its
    -1.05
     Its
    -0.91
     яке
    -0.90
    Its
    -0.89
     Оно
    -0.82
    its
    -0.81
     которое
    -0.81
    -0.78
    它的
    -0.78
     it
    -0.72
    POSITIVE LOGITS
     celles
    1.25
     herself
    1.20
     ones
    1.12
     lesquelles
    1.08
    herself
    1.00
     ellas
    0.99
    Elles
    0.96
     she
    0.92
     éstas
    0.90
     Elles
    0.88
    Act Density 0.035%

    No Known Activations