INDEX
    Explanations

    ' or * followed by numbers/letters

    New Auto-Interp
    Negative Logits
    ↵↵
    0.38
    0.30
    ."
    0.27
    ↵↵↵
    0.27
    0.26
    .”
    0.26
    ".
    0.26
    0.25
    0.25
    }$.
    0.24
    POSITIVE LOGITS
     Instead
    0.33
     Unlike
    0.33
    Unlike
    0.27
     During
    0.27
     Firstly
    0.26
     While
    0.26
     Although
    0.25
     It
    0.25
     Öncelikle
    0.25
     In
    0.25
    Act Density 1.009%

    No Known Activations