INDEX
Explanations
instances of movement or wandering
New Auto-Interp
Negative Logits
oa
-0.15
Bench
-0.15
roti
-0.15
Hoover
-0.15
ighbors
-0.15
coon
-0.14
ILA
-0.14
ARP
-0.14
acher
-0.14
Grow
-0.14
POSITIVE LOGITS
freely
0.16
391
0.15
/up
0.15
mf
0.14
467
0.14
alleg
0.14
361
0.14
866
0.14
talk
0.14
rup
0.14
Activations Density 0.067%