INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Lebanon
    -0.08
     Prov
    -0.08
    -0.08
     Puff
    -0.08
    (World
    -0.07
    _pickle
    -0.07
     Puffy
    -0.07
     Cait
    -0.07
     Muff
    -0.07
     fulf
    -0.07
    POSITIVE LOGITS
    644
    0.15
    472
    0.15
    653
    0.13
    652
    0.13
    654
    0.13
    470
    0.12
    473
    0.12
    973
    0.12
    47
    0.11
    972
    0.11
    Act Density 0.070%

    No Known Activations