INDEX
Explanations
instances of playfulness or experimentation
New Auto-Interp
Negative Logits
înal
-0.62
vérit
-0.60
stateProvider
-0.57
EconPapers
-0.56
barui
-0.55
survey
-0.55
silenzio
-0.54
UnusedPrivate
-0.54
jante
-0.53
gemens
-0.53
POSITIVE LOGITS
tinkering
1.17
experimenting
1.03
manipulations
1.02
manipulating
1.02
experimentation
0.99
manipulation
0.98
tinker
0.98
manip
0.98
manipulate
0.97
toys
0.96
Activations Density 0.205%