INDEX
Explanations
phrases related to taking action or making an effort to achieve something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.20
1.2%
32
+0.11
0.7%
1950
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1950
+0.20
0.06
950
+0.11
0.05
32
+0.11
0.05
Negative Logits
<bos>
-3.28
<?
-0.73
/***
-0.73
ⓧ
-0.71
-0.64
/*++
-0.64
///**
-0.63
//---
-0.62
<>
-0.62
Williams
-0.60
POSITIVE LOGITS
stockholm
1.37
maneu
1.22
Khart
1.19
eiffel
1.19
lidl
1.16
Keny
1.15
emphat
1.13
sophie
1.13
frankfurt
1.12
thut
1.11
Activations Density 0.193%