INDEX
    Explanations

    phrases indicating personal relationships and emotions

    New Auto-Interp
    Negative Logits
    室
    -0.16
    ELLOW
    -0.15
    ucht
    -0.15
    ataka
    -0.15
    qli
    -0.15
    issance
    -0.15
    raf
    -0.14
    manship
    -0.14
    way
    -0.13
     Dep
    -0.13
    POSITIVE LOGITS
    erge
    0.15
    lid
    0.14
    ipher
    0.14
     any
    0.14
    iph
    0.14
    isser
    0.14
     mere
    0.14
    ouro
    0.14
    passes
    0.14
     Ø£ÙĬ
    0.14
    Act Density 0.145%

    No Known Activations