INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Flavoring
    -0.80
    bleacher
    -0.74
    pmwiki
    -0.69
    zbollah
    -0.68
     dstg
    -0.68
    Netflix
    -0.67
    £ı
    -0.65
     lapt
    -0.64
    Merit
    -0.63
    Gaza
    -0.62
    POSITIVE LOGITS
     latter
    0.86
     him
    0.72
     he
    0.61
    "],
    0.58
    >(
    0.56
     she
    0.55
     Frenchman
    0.54
     such
    0.54
     both
    0.53
     hers
    0.52
    Act Density 1.136%

    No Known Activations