INDEX
Explanations
negative attributes or situations
negative phrases or terms related to fear or danger
New Auto-Interp
Negative Logits
adjusted
-0.86
UD
-0.83
adequate
-0.82
terminated
-0.81
sufficient
-0.80
credible
-0.79
probable
-0.77
effective
-0.77
weighted
-0.76
strongly
-0.76
POSITIVE LOGITS
loving
1.58
hun
1.51
watching
1.50
eating
1.49
turned
1.49
hunter
1.45
hunt
1.44
themed
1.44
machine
1.42
thing
1.41
Activations Density 0.133%