INDEX
Explanations
instances where someone did not directly respond or provide a specific answer
negative statements and denials
New Auto-Interp
Negative Logits
Compass
-0.75
Kinnikuman
-0.73
Creat
-0.66
ranged
-0.64
EStream
-0.63
Pop
-0.62
Empires
-0.60
Intern
-0.60
Roses
-0.59
Reborn
-0.59
POSITIVE LOGITS
specify
1.30
hesitate
1.26
mention
1.21
elaborate
1.17
disclose
1.16
deny
1.08
delve
1.04
explicitly
1.03
clarify
1.02
cite
1.02
Activations Density 0.087%