INDEX
Explanations
mentions of specific roles or careers and personal interactions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.10
0.4%
680
+0.07
0.3%
1271
+0.06
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
729
+0.10
0.03
707
+0.07
0.03
1544
+0.06
0.03
Negative Logits
<bos>
-1.63
/***
-0.82
-0.77
//{
-0.67
<?
-0.62
/*
-0.62
/**
-0.60
public
-0.59
ⓧ
-0.59
<?
-0.56
POSITIVE LOGITS
leads
1.72
Leads
1.71
Leads
1.66
leads
1.58
jaya
1.25
saar
1.24
maroc
1.24
bandung
1.23
jawa
1.21
magis
1.17
Activations Density 0.078%