INDEX
Explanations
occurrences of the word 'the' followed by a significant word
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.33
2.0%
1967
+0.11
0.7%
1892
+0.09
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1967
+0.33
0.06
1942
+0.11
0.05
479
+0.09
0.05
Negative Logits
<bos>
-3.64
-1.00
ⓧ
-0.94
/**
-0.89
<?
-0.88
<?
-0.72
/*
-0.70
/*!
-0.65
contentLoaded
-0.64
/***
-0.62
POSITIVE LOGITS
bandung
1.24
Minang
1.11
affor
1.09
maroc
1.09
perpé
1.05
seksi
1.05
jawa
1.04
gila
1.01
lele
1.01
Pekan
1.00
Activations Density 0.269%