INDEX
Explanations
phrases indicating levels of engagement or involvement
New Auto-Interp
Negative Logits
oad
-0.16
apa
-0.15
ucher
-0.15
seg
-0.15
sort
-0.15
kind
-0.14
afs
-0.14
ask
-0.14
macros
-0.13
elf
-0.13
POSITIVE LOGITS
operation
0.25
operation
0.21
effort
0.18
action
0.18
illes
0.17
activity
0.17
coverage
0.17
expression
0.17
operations
0.17
affairs
0.17
Activations Density 0.241%