INDEX
Explanations
phrases related to legal matters and consequences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.25
1.2%
1381
+0.09
0.4%
1993
+0.08
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1993
+0.25
0.04
1751
+0.09
0.04
1381
+0.08
0.04
Negative Logits
<bos>
-3.64
public
-0.75
andaag
-0.69
/*
-0.64
@
-0.64
//
-0.64
<eos>
-0.63
/**
-0.63
.
-0.59
↵↵
-0.59
POSITIVE LOGITS
stockholm
1.53
Minang
1.44
maroc
1.44
desir
1.42
lidl
1.42
wien
1.41
sovere
1.41
aen
1.40
applau
1.39
lyon
1.38
Activations Density 0.580%