INDEX
    Explanations

    punctuation marks and expressions of discomfort or hesitation

    New Auto-Interp
    Negative Logits
     it
    -0.27
    å®ĥ
    -0.23
    It
    -0.22
     It
    -0.19
    ï¼Įå®ĥ
    -0.18
     оно
    -0.18
    ,it
    -0.18
     nó
    -0.18
    [it
    -0.16
    ÑĢаÑĤи
    -0.15
    POSITIVE LOGITS
     if
    0.23
     personally
    0.22
     If
    0.22
     whenever
    0.20
     given
    0.18
     when
    0.18
    _if
    0.18
     jika
    0.18
    å¦Ĥæŀľ
    0.17
     anything
    0.17
    Act Density 0.038%

    No Known Activations