INDEX
Explanations
dialogue lines enclosed in quotation marks
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.45
3.1%
108
+0.07
0.5%
1741
+0.07
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1403
+0.45
0.22
800
+0.07
0.26
15
+0.07
0.24
Negative Logits
reluct
-3.79
unspeak
-3.79
impra
-3.71
shenan
-3.66
disgra
-3.59
indestru
-3.58
disreg
-3.54
disagre
-3.51
increa
-3.49
inconce
-3.42
POSITIVE LOGITS
<bos>
13.70
Autoritní
2.44
betweenstory
2.40
GEBURTSDATUM
2.34
expandindo
2.30
تقاوى
2.13
Paglinawan
2.11
Administrativna
2.08
autorytatywna
2.08
Italijani
2.07
Activations Density 0.084%