INDEX
Explanations
phrases related to judgement and evaluation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.10
0.3%
236
+0.07
0.2%
2025
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1470
+0.10
0.04
1827
+0.07
0.04
208
+0.07
0.05
Negative Logits
unlaw
-1.42
guarante
-1.27
volunte
-1.26
panama
-1.26
swarovski
-1.25
encomp
-1.24
affor
-1.24
increa
-1.23
impra
-1.22
strick
-1.21
POSITIVE LOGITS
choosing
0.68
leaving
0.62
bringing
0.61
assuming
0.59
allowing
0.59
considering
0.59
sending
0.58
letting
0.58
putting
0.57
having
0.57
Activations Density 0.344%