INDEX
    Explanations

    phrases indicating mention or reference to significant subjects or topics

    New Auto-Interp
    Negative Logits
    AndEndTag
    -0.90
    enderror
    -0.90
    Hentet
    -0.88
     autorytatywna
    -0.88
     виправивши
    -0.85
    principalColumn
    -0.85
    ześnie
    -0.83
    تقاوى
    -0.82
    tagHelperRunner
    -0.81
    LEncoder
    -0.81
    POSITIVE LOGITS
     stand
    0.53
     plomb
    0.52
    gar
    0.51
    zet
    0.51
    ins
    0.50
    stand
    0.49
    umpe
    0.47
    trin
    0.47
     kost
    0.46
    lar
    0.45
    Act Density 0.032%

    No Known Activations