INDEX
Explanations
phrases related to expressing thoughts, decisions, and actions
New Auto-Interp
Negative Logits
ËĪ
-0.66
surprisingly
-0.61
ortium
-0.59
utterstock
-0.58
utenberg
-0.57
Slate
-0.57
"$
-0.57
famously
-0.56
ostensibly
-0.55
"
-0.55
POSITIVE LOGITS
)."
1.50
."
1.40
.''
1.37
'."
1.37
".
1.32
.'"
1.30
''.
1.28
',"
1.26
]."
1.25
),"
1.25
Activations Density 0.845%