INDEX
Explanations
expressions related to goals, plans, and intentions
New Auto-Interp
Negative Logits
Guard
-0.75
sis
-0.70
SPONSORED
-0.64
hook
-0.61
fax
-0.61
éĸ
-0.61
oran
-0.61
rm
-0.60
dating
-0.60
incl
-0.60
POSITIVE LOGITS
simplicity
0.83
consistency
0.80
clarity
0.77
seamless
0.74
healthy
0.74
faire
0.74
imize
0.72
fairness
0.72
disruptive
0.72
awareness
0.71
Activations Density 0.299%