INDEX
    Explanations

    occurrences of the word "one"

    New Auto-Interp
    Negative Logits
    zos
    -0.74
    keepers
    -0.72
    LESS
    -0.68
    ocracy
    -0.65
    cats
    -0.62
    letters
    -0.61
    ursed
    -0.61
    keeper
    -0.60
    brates
    -0.59
    IVES
    -0.58
    POSITIVE LOGITS
     glance
    1.15
     behest
    0.89
     moment
    0.88
     apiece
    0.86
     point
    0.85
    point
    0.83
     expense
    0.81
     end
    0.78
     instance
    0.78
     level
    0.77
    Act Density 0.009%

    No Known Activations