INDEX
Explanations
words related to testing, trying out, or exploring different options and ideas
terms related to experimentation and testing
New Auto-Interp
Negative Logits
Cheong
-0.75
mary
-0.68
CLOSE
-0.66
games
-0.66
olulu
-0.65
ens
-0.64
Calling
-0.64
die
-0.64
si
-0.63
BN
-0.62
POSITIVE LOGITS
experimentation
1.26
experimenting
1.21
experimented
1.13
experiments
0.94
tink
0.88
imental
0.88
withd
0.86
experiment
0.86
aults
0.85
Experiment
0.81
Activations Density 0.007%