INDEX
    Explanations

    beloved followed by a noun

    New Auto-Interp
    Negative Logits
    ل
    1.61
    ن
    1.36
    ر
    1.20
    ب
    1.20
    h
    1.16
    1.04
    л
    1.02
    a
    0.98
    r
    0.96
    0.96
    POSITIVE LOGITS
    زين
    0.98
    schutz
    0.93
    EM
    0.89
     beloved
    0.88
     adored
    0.88
    de
    0.87
    ě
    0.85
     bodo
    0.84
    ни
    0.84
     marred
    0.83
    Act Density 0.002%

    No Known Activations