INDEX
Explanations
statements or actions that are intentional and planned
terms related to intentional and deliberate actions
New Auto-Interp
Negative Logits
Tycoon
-0.83
Solitaire
-0.78
abies
-0.76
stars
-0.75
iao
-0.70
Dogs
-0.68
Locked
-0.67
orph
-0.67
zona
-0.67
Veter
-0.66
POSITIVE LOGITS
intentional
0.98
effort
0.90
violation
0.87
omission
0.86
wrongdoing
0.85
deliberate
0.82
contradiction
0.79
undertaking
0.76
deception
0.76
attempt
0.76
Activations Density 0.025%