INDEX
Explanations
the presence of the word "that" in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.31
1.5%
161
+0.20
1.0%
1350
+0.10
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
161
+0.31
0.11
1350
+0.20
0.06
1065
+0.10
0.06
Negative Logits
<bos>
-2.64
ⓧ
-0.51
-0.51
splet
-0.50
/***
-0.50
<?
-0.49
///**
-0.48
transact
-0.48
praš
-0.47
<?
-0.46
POSITIVE LOGITS
thut
1.03
Palembang
0.86
Minang
0.84
pollut
0.81
Jambi
0.78
maneu
0.77
bandung
0.76
frankfurt
0.76
quoique
0.76
sovere
0.75
Activations Density 0.312%