INDEX
    Explanations

    technical terms for errors

    phrases indicating causation or choice

    New Auto-Interp
    Negative Logits
     which
    -1.54
    which
    -1.51
    Which
    -1.44
     Which
    -1.43
     WHICH
    -1.31
     laquelle
    -1.05
     quale
    -1.00
     cui
    -0.98
     cual
    -0.96
     lesquelles
    -0.93
    POSITIVE LOGITS
     that
    1.48
    that
    0.91
     bahwa
    0.86
     rằng
    0.86
     bahawa
    0.71
     kwamba
    0.70
     ότι
    0.67
     że
    0.64
     That
    0.61
     thut
    0.59
    Act Density 4.984%

    No Known Activations