INDEX
    Explanations

    phrases indicating neglect or disregard

    New Auto-Interp
    Negative Logits
    V
    -0.66
     Roderick
    -0.65
     fær
    -0.64
     AssemblyProduct
    -0.64
    T
    -0.63
    vel
    -0.61
    d
    -0.61
    B
    -0.61
     AppCompat
    -0.61
    S
    -0.60
    POSITIVE LOGITS
     ignore
    1.50
     ignored
    1.50
     ignoring
    1.49
     ignores
    1.42
     Ignore
    1.42
    gnore
    1.32
    Ignored
    1.28
    ignore
    1.26
     ignor
    1.25
    ignored
    1.22
    Act Density 0.109%

    No Known Activations