INDEX
Explanations
phrases related to a commercial context like sales promotions, exchanges, and customer service
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.16
1.0%
1233
+0.10
0.6%
755
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
276
+0.16
0.04
1895
+0.10
0.03
1233
+0.10
0.03
Negative Logits
<bos>
-3.18
public
-0.76
ⓧ
-0.74
/**
-0.73
protected
-0.68
char
-0.67
com
-0.67
stereotype
-0.66
int
-0.66
struct
-0.65
POSITIVE LOGITS
ftu
1.95
fta
1.94
Juf
1.93
emphat
1.93
maneu
1.91
reluct
1.89
thut
1.89
stockholm
1.89
apprehen
1.87
increa
1.81
Activations Density 0.084%