INDEX
Explanations
conjunctions indicating connection or relationship between ideas or entities
New Auto-Interp
Negative Logits
works
-0.30
Works
-0.30
ework
-0.28
WORK
-0.27
workload
-0.27
Works
-0.25
works
-0.24
-work
-0.24
work
-0.24
_work
-0.24
POSITIVE LOGITS
play
0.28
leisure
0.22
Play
0.22
study
0.19
art
0.18
/play
0.18
PLAY
0.18
Leisure
0.18
Play
0.17
PLAY
0.16
Activations Density 0.012%