INDEX
    Explanations

    question prompts and instructions

    New Auto-Interp
    Negative Logits
     but
    0.92
    それを
    0.82
     ovviamente
    0.82
     αλλά
    0.80
     évidemment
    0.80
     όχι
    0.79
     ಅದು
    0.79
     фактически
    0.78
     거고
    0.77
     essentially
    0.76
    POSITIVE LOGITS
     Suppose
    1.31
    Suppose
    1.24
     When
    1.16
     During
    1.11
    When
    1.10
    During
    1.04
     After
    1.03
     Podczas
    1.01
     Imagine
    1.00
     Following
    0.97
    Act Density 0.334%

    No Known Activations