INDEX
Explanations
phrases related to decision-making and uncertainty
New Auto-Interp
Negative Logits
Catalog
-0.97
thal
-0.78
lator
-0.77
oxide
-0.76
atari
-0.76
oola
-0.75
oven
-0.72
rites
-0.70
axter
-0.69
odder
-0.69
POSITIVE LOGITS
soever
0.87
anyone
0.85
anybody
0.78
they
0.78
consciously
0.70
there
0.69
intentional
0.68
it
0.67
respondents
0.67
exactly
0.66
Activations Density 0.958%