INDEX
Explanations
questions asking for opinions or perspectives on various topics
New Auto-Interp
Negative Logits
arm
-0.75
Ãį
-0.72
Thompson
-0.72
¯¯¯¯¯¯¯¯
-0.70
River
-0.70
Published
-0.70
aunder
-0.69
externalActionCode
-0.69
cycle
-0.69
reen
-0.68
POSITIVE LOGITS
...?
0.93
!?
0.81
those
0.76
?!
0.73
?
0.73
?:
0.71
!?"
0.70
fairness
0.70
protecting
0.70
grandchildren
0.68
Activations Density 0.019%