INDEX
    Explanations

    references to the word "rum" at various strengths

    New Auto-Interp
    Negative Logits
    lihood
    -0.85
     Parenthood
    -0.84
    KI
    -0.65
     Shades
    -0.64
     Kut
    -0.61
     Wildcats
    -0.61
     Egyptians
    -0.61
     Stafford
    -0.61
     Cancel
    -0.61
    Merit
    -0.60
    POSITIVE LOGITS
    inating
    1.01
     rum
    1.00
    oured
    0.99
    ble
    0.98
    atis
    0.93
    pled
    0.91
    rum
    0.85
    inally
    0.84
    inate
    0.83
    ination
    0.83
    Act Density 0.009%

    No Known Activations