INDEX
    Explanations

    the word "UR" with varying levels of activation

    occurrences of the abbreviation "UR"

    New Auto-Interp
    Negative Logits
     Schultz
    -0.74
    olean
    -0.72
     Sands
    -0.72
     Kissinger
    -0.70
    etts
    -0.69
    xon
    -0.68
    makers
    -0.67
     hed
    -0.67
    notes
    -0.65
     Eb
    -0.64
    POSITIVE LOGITS
    UR
    1.10
    POSE
    1.08
    BLE
    1.02
    GER
    0.99
    ARCH
    0.97
    OPE
    0.94
     confir
    0.93
    AGE
    0.93
    pees
    0.92
    DER
    0.91
    Act Density 0.005%

    No Known Activations