INDEX
Explanations
requests for actions or inquiries directed towards others
New Auto-Interp
Negative Logits
ighth
-0.08
zung
-0.07
osas
-0.07
pillar
-0.07
enas
-0.07
Editable
-0.07
370
-0.06
geh
-0.06
plies
-0.06
omor
-0.06
POSITIVE LOGITS
consideration
0.11
acceptance
0.09
forgiveness
0.09
readers
0.08
attention
0.08
consider
0.08
allowance
0.08
reader
0.08
attention
0.08
Consider
0.08
Activations Density 0.039%