INDEX
Explanations
mentions of current events and politics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.18
0.5%
394
+0.13
0.4%
1177
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
658
+0.18
0.09
862
+0.13
0.04
394
+0.10
0.07
Negative Logits
<bos>
-0.82
katun
-0.63
bambu
-0.60
lampa
-0.57
ekos
-0.56
jaya
-0.56
abstrak
-0.56
bahay
-0.54
pama
-0.54
bagay
-0.53
POSITIVE LOGITS
indestru
0.94
unspeak
0.86
disreg
0.82
exorbit
0.79
shenan
0.79
inappro
0.79
ineffec
0.79
maneu
0.78
impra
0.78
indescri
0.76
Activations Density 0.949%