INDEX
Explanations
phrases indicating inappropriate relationships or flirtation
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.05
3:0.06
4:0.08
5:0.03
6:0.06
7:0.43
8:0.04
9:0.03
10:0.09
11:0.08
Negative Logits
stadt
-1.59
udeb
-1.48
Rated
-1.46
imil
-1.40
dam
-1.38
stressed
-1.37
Fukushima
-1.37
ighed
-1.37
example
-1.36
Auschwitz
-1.35
POSITIVE LOGITS
pardon
1.85
brink
1.55
bandwagon
1.54
forgiveness
1.52
renewal
1.51
flirt
1.48
sponsorship
1.47
acceptance
1.46
solicitation
1.41
appro
1.39
Activations Density 0.001%