INDEX
    Explanations

    questions about programming and style

    New Auto-Interp
    Negative Logits
    </h1>
    -0.80
     assistants
    -0.78
    Срок
    -0.76
    ємо
    -0.76
    кает
    -0.75
     almost
    -0.75
     immune
    -0.74
     only
    -0.73
     inflammation
    -0.73
    salut
    -0.72
    POSITIVE LOGITS
    0.82
    打击
    0.77
     auftreten
    0.77
     какой
    0.76
    yant
    0.73
     stator
    0.73
    ->$
    0.73
    küche
    0.72
     дорого
    0.71
     zusammenge
    0.71
    Act Density 0.000%

    No Known Activations