INDEX
Explanations
verb phrases related to human activities, strategies, and community interactions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1967
+0.14
0.4%
674
+0.08
0.2%
507
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1307
+0.14
0.02
845
+0.08
0.03
1776
+0.07
0.02
Negative Logits
InjectAttribute
-0.85
bezeichneter
-0.77
InstrumentedTest
-0.60
UnusedPrivate
-0.60
MessageOf
-0.60
ItemBackground
-0.60
كومونز
-0.58
fjspx
-0.58
SourceChecksum
-0.58
unknownFields
-0.57
POSITIVE LOGITS
Lma
0.85
scrat
0.85
Lmfao
0.80
Ikr
0.78
FTFY
0.78
eyel
0.73
Yeet
0.73
UwU
0.73
perfet
0.71
milf
0.70
Activations Density 0.239%