INDEX
Explanations
verbs expressing likelihood or preference
phrases that indicate tendencies or patterns in behavior
New Auto-Interp
Negative Logits
yet
-0.81
hello
-0.74
bats
-0.73
teen
-0.67
raq
-0.66
ATS
-0.66
Ready
-0.65
spection
-0.64
Valid
-0.64
lights
-0.63
POSITIVE LOGITS
prioritize
1.32
underestimate
1.27
behave
1.22
concentrate
1.20
specialize
1.20
emphasize
1.19
prefer
1.18
accumulate
1.17
overest
1.16
rely
1.16
Activations Density 0.100%