INDEX
    Explanations

    names of people, particularly in contexts related to personal stories or experiences

    New Auto-Interp
    Negative Logits
    -INF
    -0.17
    ÑĤÑİ
    -0.16
    ureen
    -0.16
    _ASSUME
    -0.16
     frau
    -0.15
     iddi
    -0.14
     aras
    -0.14
    «ĺ
    -0.14
     frauen
    -0.14
    ustain
    -0.14
    POSITIVE LOGITS
    's
    0.21
    ’s
    0.20
     from
    0.19
     &
    0.19
     B
    0.18
     and
    0.18
     the
    0.18
     O
    0.17
     T
    0.17
     R
    0.17
    Act Density 0.664%

    No Known Activations