INDEX
Explanations
YouTube video URLs characterized by a specific format
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.17
0.5%
924
+0.11
0.3%
609
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
924
+0.17
0.02
1343
+0.11
0.02
1478
+0.08
0.01
Negative Logits
unspeak
-0.95
Juf
-0.85
apprehen
-0.85
vainly
-0.85
coö
-0.84
quitted
-0.84
gaily
-0.83
shenan
-0.82
impelled
-0.78
pooh
-0.77
POSITIVE LOGITS
<bos>
0.72
Искәрмәләр
0.62
Gdy
0.60
שוליים
0.60
v
0.58
otomatig
0.53
borderSide
0.51
zzleHttp
0.49
v
0.49
Kdo
0.48
Activations Density 0.023%