INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Ridge
    -0.07
    -0.07
     Genesis
    -0.07
    341
    -0.07
    [position
    -0.06
    Ord
    -0.06
    AGE
    -0.06
     widow
    -0.06
     Meat
    -0.06
    aller
    -0.06
    POSITIVE LOGITS
     dub
    0.14
     Dub
    0.14
    Dub
    0.14
    dub
    0.12
     Dublin
    0.10
     dubious
    0.09
    ubl
    0.09
     dubbed
    0.09
     pub
    0.08
     xcb
    0.07
    Act Density 0.002%

    No Known Activations