INDEX
Explanations
phrases related to interactive experiences and activities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
1.0%
281
+0.11
0.6%
1124
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
281
+0.18
0.03
316
+0.11
0.02
1416
+0.11
0.02
Negative Logits
<bos>
-2.95
/***
-0.80
/*++
-0.68
<?
-0.68
HasAnnotation
-0.63
Vegeu
-0.61
DeleteMapping
-0.60
Савезне
-0.60
件事情
-0.60
///**
-0.58
POSITIVE LOGITS
Juf
1.33
stockholm
1.32
affor
1.30
Intere
1.23
Middles
1.23
scrat
1.22
Eft
1.21
Augu
1.21
desir
1.16
panama
1.16
Activations Density 0.093%