INDEX
    Explanations

    why, how, describe, explain

    New Auto-Interp
    Negative Logits
     ибо
    0.50
     exiled
    0.50
     jaanu
    0.49
     politika
    0.49
     bisnis
    0.49
     nobility
    0.49
     mohabbat
    0.47
     laissant
    0.47
     zahval
    0.47
     hakk
    0.47
    POSITIVE LOGITS
     Using
    0.54
     ideas
    0.51
     Examples
    0.51
    0.50
     Describe
    0.50
     terminology
    0.50
     different
    0.49
     Explain
    0.49
     what
    0.48
     outcomes
    0.47
    Act Density 0.002%

    No Known Activations