INDEX
Explanations
phrases related to allegations and denials in legal contexts
New Auto-Interp
Negative Logits
ænd
-0.15
Conclusion
-0.15
/Instruction
-0.14
imas
-0.14
Ralph
-0.14
Grü
-0.13
Rebellion
-0.13
elp
-0.13
Substitute
-0.13
rah
-0.13
POSITIVE LOGITS
response
0.29
counter
0.28
counters
0.28
respond
0.27
counter
0.26
response
0.25
reply
0.25
rebut
0.24
-counter
0.24
Response
0.24
Activations Density 0.238%