INDEX
Explanations
verbs related to actions or processes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.22
1.0%
324
+0.09
0.4%
507
+0.08
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
507
+0.22
0.08
92
+0.09
0.07
166
+0.08
0.07
Negative Logits
<bos>
-3.01
///**
-0.70
Filmografie
-0.64
,
-0.63
.
-0.61
also
-0.61
public
-0.60
/***
-0.59
bezeichneter
-0.58
private
-0.58
POSITIVE LOGITS
Juf
1.53
stockholm
1.51
eiffel
1.46
madonna
1.43
unlaw
1.43
toledo
1.42
squa
1.41
sovere
1.41
affor
1.38
unwarran
1.36
Activations Density 1.724%