INDEX
Explanations
locations such as parks, cities, and studios
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.27
1.0%
184
+0.25
0.9%
856
+0.16
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.27
0.04
227
+0.25
0.04
184
+0.16
0.02
Negative Logits
<bos>
-0.56
betweenstory
-0.52
utafitiHapana
-0.50
)_/¯
-0.50
bezeichneter
-0.46
parsedMessage
-0.44
ArrowToggle
-0.44
UnusedPrivate
-0.44
ویکیپدیای
-0.43
writeFieldEnd
-0.42
POSITIVE LOGITS
Lmao
0.63
Fuckin
0.60
lmfao
0.59
Xoxo
0.58
🤣🤣
0.57
minValue
0.56
Bullshit
0.56
😭😭
0.56
!...
0.54
Wtf
0.54
Activations Density 0.137%