INDEX
    Explanations

    phrases questioning the logic and efficacy of ideas and actions

    New Auto-Interp
    Negative Logits
    apol
    -0.17
    άνÏī
    -0.16
    alink
    -0.16
    ')==
    -0.15
    ÐIJÑĢÑħÑĸв
    -0.14
    erif
    -0.14
     canon
    -0.14
    canonical
    -0.14
    надлеж
    -0.14
    chein
    -0.14
    POSITIVE LOGITS
     could
    1.09
    could
    0.96
     Could
    0.93
    Could
    0.88
     kunne
    0.55
     могли
    0.54
     konnte
    0.51
     могла
    0.51
     CO
    0.51
     мог
    0.46
    Act Density 0.464%

    No Known Activations