INDEX
    Explanations

    quotation marks and dialogue formatting

    New Auto-Interp
    Negative Logits
    ourke
    -0.16
    stoff
    -0.15
    inine
    -0.14
    utters
    -0.14
    switch
    -0.14
    .extra
    -0.14
    hay
    -0.14
    UBL
    -0.14
    andler
    -0.14
    542
    -0.13
    POSITIVE LOGITS
     jus
    0.15
    up
    0.15
     Harr
    0.14
     neither
    0.14
    org
    0.14
    edu
    0.14
     dem
    0.13
     arts
    0.13
    eth
    0.13
     scaler
    0.13
    Act Density 0.102%

    No Known Activations