INDEX
Explanations
proper nouns or names
phrases related to decision-making or options
New Auto-Interp
Negative Logits
firsthand
-0.84
remorse
-0.73
impat
-0.69
whistle
-0.69
distress
-0.68
whistlebl
-0.67
warning
-0.67
Failure
-0.67
laughter
-0.65
alarm
-0.64
POSITIVE LOGITS
grouping
0.97
grouped
0.96
swapped
0.95
divide
0.93
subtract
0.93
subdiv
0.93
split
0.90
split
0.89
releg
0.88
opted
0.87
Activations Density 0.634%