INDEX
    Explanations

    phrases indicating inappropriate relationships or flirtation

    New Auto-Interp
    Head Attr Weights
    0:0.01
    1:0.01
    2:0.05
    3:0.06
    4:0.08
    5:0.03
    6:0.06
    7:0.43
    8:0.04
    9:0.03
    10:0.09
    11:0.08
    Negative Logits
    stadt
    -1.59
    udeb
    -1.48
    Rated
    -1.46
    imil
    -1.40
    dam
    -1.38
     stressed
    -1.37
     Fukushima
    -1.37
    ighed
    -1.37
    example
    -1.36
     Auschwitz
    -1.35
    POSITIVE LOGITS
     pardon
    1.85
     brink
    1.55
     bandwagon
    1.54
     forgiveness
    1.52
     renewal
    1.51
     flirt
    1.48
     sponsorship
    1.47
     acceptance
    1.46
     solicitation
    1.41
    appro
    1.39
    Act Density 0.001%

    No Known Activations