INDEX
    Explanations

    "Okay," explanation starter phrase

    New Auto-Interp
    Negative Logits
    6
    1.04
    theater
    1.01
    7
    0.96
    2
    0.95
    bagian
    0.95
    5
    0.94
    foss
    0.94
    astu
    0.93
     втори
    0.93
    8
    0.93
    POSITIVE LOGITS
     extending
    1.32
     Extended
    1.13
     leveraging
    1.10
     extended
    1.06
     Nox
    1.06
     extend
    1.05
     reinforcing
    1.05
     granting
    1.05
     retrieve
    1.03
     malicious
    1.02
    Act Density 0.011%

    No Known Activations