INDEX
    Explanations

    instances of familial relationships and social dynamics

    New Auto-Interp
    Negative Logits
    orian
    -0.19
    æĢĸ
    -0.14
    INUX
    -0.14
    .ret
    -0.14
    ells
    -0.14
    iÄĻ
    -0.14
    kinson
    -0.13
    izo
    -0.13
    lest
    -0.13
     Wich
    -0.13
    POSITIVE LOGITS
     ignore
    0.45
     ignored
    0.45
     disreg
    0.44
     defiance
    0.44
     disob
    0.44
     disregard
    0.43
     ignoring
    0.43
     ignores
    0.40
     def
    0.38
    ignore
    0.38
    Act Density 0.317%

    No Known Activations