INDEX
    Explanations

    phrases that introduce examples or list components

    New Auto-Interp
    Negative Logits
     Anſ
    -0.63
     another
    -0.62
     cà
    -0.60
     مشين
    -0.59
    تاريخ
    -0.59
     Jefus
    -0.58
     Conſ
    -0.58
     Diſ
    -0.57
     ſen
    -0.56
    Ύ
    -0.56
    POSITIVE LOGITS
     including
    0.88
    voorbeeld
    0.87
    INCLUDING
    0.86
     zoals
    0.86
     like
    0.86
    including
    0.82
     telles
    0.82
     např
    0.81
    Including
    0.79
     INCLUDING
    0.77
    Act Density 0.152%

    No Known Activations