INDEX
Explanations
references to YouTube and related terminology
New Auto-Interp
Negative Logits
rog
-0.17
Tweets
-0.17
hu
-0.16
Tweet
-0.16
994
-0.16
omit
-0.15
lov
-0.15
_patches
-0.15
tek
-0.15
.inputs
-0.14
POSITIVE LOGITS
sensation
0.21
sensations
0.21
channel
0.18
tube
0.18
outu
0.17
personalities
0.16
tube
0.16
-channel
0.16
channels
0.15
algo
0.15
Activations Density 0.007%