INDEX
Explanations
questioning or expressing concern about something in a critical manner
phrases that question the consideration of others' needs or perspectives
New Auto-Interp
Negative Logits
arm
-0.74
Ãį
-0.72
Published
-0.71
River
-0.70
¯¯¯¯¯¯¯¯
-0.70
ells
-0.69
Sher
-0.68
oppy
-0.68
Thompson
-0.68
imb
-0.68
POSITIVE LOGITS
...?
0.90
!?
0.79
those
0.75
protecting
0.71
?!
0.70
fairness
0.69
!?"
0.68
grandchildren
0.68
?
0.67
?:
0.66
Activations Density 0.031%