INDEX
Explanations
rhetorical questions and statements related to accountability and social issues
New Auto-Interp
Negative Logits
inan
-0.17
sock
-0.15
272
-0.15
instr
-0.14
å®ħ
-0.14
çīĩ
-0.14
shall
-0.13
룰
-0.13
utt
-0.13
abbr
-0.13
POSITIVE LOGITS
oni
0.15
cui
0.15
lee
0.15
аÑĢÑĩ
0.15
given
0.14
LEE
0.14
Given
0.14
are
0.14
given
0.14
indre
0.14
Activations Density 0.097%