INDEX
    Explanations

    mentions of the word "multiple."

    New Auto-Interp
    Negative Logits
    Dialogue
    -0.71
    instead
    -0.69
    iquette
    -0.68
    ESE
    -0.66
    NER
    -0.65
    NEY
    -0.64
    brance
    -0.64
    nothing
    -0.63
    atown
    -0.63
     Notting
    -0.63
    POSITIVE LOGITS
     sclerosis
    1.79
    xes
    1.60
     iterations
    1.24
     layers
    1.16
     occasions
    1.13
     versions
    1.11
     generations
    1.11
     viewpoints
    1.11
     instances
    1.10
     perspectives
    1.10
    Act Density 0.038%

    No Known Activations