INDEX
    Explanations

    the word "no" being a prevalent theme in the text snippets it is activated by

    New Auto-Interp
    Negative Logits
    RAFT
    -0.84
    romy
    -0.65
    mosp
    -0.62
    jet
    -0.59
    ousand
    -0.56
    inese
    -0.56
    rex
    -0.55
    ahime
    -0.55
    nesses
    -0.55
    encia
    -0.54
    POSITIVE LOGITS
    xious
    1.30
     longer
    1.20
     matter
    0.93
     doubt
    0.91
    ct
    0.90
    obs
    0.83
    except
    0.80
    xus
    0.80
    icably
    0.79
    otrop
    0.78
    Act Density 0.069%

    No Known Activations