INDEX
Explanations
phrases related to defense and justification in arguments
New Auto-Interp
Negative Logits
shit
-0.17
[
-0.16
downfall
-0.15
Definitely
-0.15
fucked
-0.15
[b
-0.15
([
-0.15
Fuck
-0.15
[image
-0.15
[p
-0.14
POSITIVE LOGITS
frankly
0.21
ought
0.19
extraordinarily
0.18
----↵
0.18
candid
0.18
uh
0.18
precisely
0.17
enormously
0.17
Chairman
0.16
Senator
0.16
Activations Density 0.131%