INDEX
Explanations
instances where actions or decisions are being directed towards a specific goal or outcome
instances of the verb "put"
New Auto-Interp
Negative Logits
externalActionCode
-0.98
ording
-0.66
riott
-0.66
Examiner
-0.65
Austral
-0.65
uary
-0.62
yan
-0.62
cean
-0.61
Sample
-0.60
BLE
-0.59
POSITIVE LOGITS
rid
0.91
ongh
0.90
together
0.90
forth
0.88
tering
0.87
downs
0.84
tin
0.83
aside
0.81
down
0.80
tered
0.77
Activations Density 0.047%