INDEX
Explanations
adjectives and adverbs that emphasize qualities or states
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.39
2.1%
1967
+0.26
1.4%
1870
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.39
0.11
624
+0.26
0.04
1870
+0.12
0.04
Negative Logits
<bos>
-3.04
/*!
-0.92
<?
-0.89
ⓧ
-0.88
/***
-0.88
-0.84
/**
-0.81
<?
-0.78
fputs
-0.69
/*++
-0.66
POSITIVE LOGITS
bandung
1.30
Minang
1.28
jaya
1.25
lele
1.23
jawa
1.16
surabaya
1.06
seksi
1.05
malang
1.04
vne
1.03
alip
1.01
Activations Density 0.734%