INDEX
    Explanations

    writing and fun

    This neuron detects adjectives and adverbs that specify the assistant’s desired tone or style (e.g., “funny,” “edgy,” “like”).

    New Auto-Interp
    Negative Logits
     copied
    -0.06
    xfe
    -0.06
     Cleans
    -0.06
    _material
    -0.06
    opic
    -0.05
    quirer
    -0.05
     obten
    -0.05
     Гол
    -0.05
     Kiş
    -0.05
    enville
    -0.05
    POSITIVE LOGITS
     sea
    0.08
     những
    0.07
     hedge
    0.07
    是一个
    0.07
    .states
    0.06
     Điều
    0.06
     enim
    0.06
    *'
    0.06
     زمینه
    0.06
    July
    0.06
    Act Density 0.003%

    No Known Activations