INDEX
Explanations
phrases indicating the effectiveness or appropriateness of choices and strategies
New Auto-Interp
Negative Logits
indsight
-0.17
various
-0.16
better
-0.15
YRO
-0.14
arine
-0.14
more
-0.13
IEW
-0.13
anches
-0.13
cope
-0.13
Affairs
-0.13
POSITIVE LOGITS
combination
0.32
kind
0.30
amount
0.30
kinds
0.25
combination
0.25
mix
0.25
balance
0.25
thing
0.25
amount
0.24
/right
0.24
Activations Density 0.053%