INDEX
Explanations
rumors or speculation mentioned in the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1103
+0.08
0.3%
313
+0.08
0.3%
1137
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1137
+0.08
0.03
532
+0.08
0.02
844
+0.07
0.02
Negative Logits
<bos>
-1.17
/**
-0.61
return
-0.58
/**
-0.58
և
-0.57
const
-0.56
/***
-0.55
interface
-0.54
catch
-0.53
-0.52
POSITIVE LOGITS
rumor
2.51
rumour
2.42
rumors
2.39
rumours
2.27
Rumors
2.24
Rumors
1.97
rumored
1.89
rumoured
1.84
Rum
1.50
Rum
1.43
Activations Density 0.102%