INDEX
    Explanations

    phrases related to specific names or proper nouns, particularly "Ben" or variations of it

    proper nouns, specifically names and titles

    New Auto-Interp
    Negative Logits
    oslav
    -0.79
    FORMATION
    -0.67
    Interested
    -0.65
     VIDE
    -0.65
    olor
    -0.64
     veins
    -0.64
    ngth
    -0.63
    é¾įåĸļ士
    -0.63
    glers
    -0.62
     constitu
    -0.61
    POSITIVE LOGITS
    rama
    0.79
    heid
    0.72
    gui
    0.71
    ilot
    0.71
    ente
    0.70
    aline
    0.69
    fen
    0.67
    ij士
    0.66
    zeb
    0.66
    alion
    0.66
    Act Density 0.071%

    No Known Activations