INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     pleaſure
    -0.75
     ſtate
    -0.74
    ſelf
    -0.73
     Majefty
    -0.73
     NavController
    -0.70
     itſelf
    -0.68
     faſt
    -0.68
     ſta
    -0.68
     houſe
    -0.67
     ſche
    -0.66
    POSITIVE LOGITS
    for
    1.18
    For
    0.78
    FOR
    0.76
    foreach
    0.71
     For
    0.71
     FOR
    0.63
    ForEach
    0.60
    forEach
    0.59
    ForAll
    0.57
     every
    0.53
    Act Density 0.008%

    No Known Activations