INDEX
Explanations
instances of potential conflicts and conflicts of interest
words associated with various types of conflicts
New Auto-Interp
Negative Logits
trak
-0.81
GC
-0.76
************
-0.75
GV
-0.73
Stud
-0.72
girls
-0.71
slow
-0.70
AA
-0.69
Pione
-0.68
DT
-0.68
POSITIVE LOGITS
conflicts
0.93
conflict
0.82
unresolved
0.81
between
0.78
situations
0.72
arising
0.71
resolves
0.71
conflicted
0.69
Conflict
0.69
misunderstand
0.69
Activations Density 0.017%