INDEX
Explanations
phrases related to rules, regulations, and organizational processes
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.08
3:0.07
4:0.24
5:0.03
6:0.03
7:0.28
8:0.03
9:0.04
10:0.07
11:0.05
Negative Logits
rella
-1.78
amon
-1.72
pired
-1.72
riter
-1.70
arta
-1.60
orses
-1.58
reon
-1.57
verbs
-1.57
Adventure
-1.54
famous
-1.51
POSITIVE LOGITS
fatigue
1.82
imbalance
1.71
troublesome
1.62
toxicity
1.61
defect
1.60
deficiencies
1.60
rash
1.59
wrongdoing
1.57
offending
1.54
withdrawals
1.50
Activations Density 0.001%