INDEX
Explanations
phrases related to thoughts and opinions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1137
+0.10
0.3%
1065
+0.10
0.3%
878
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1515
+0.10
0.05
1065
+0.10
0.05
1137
+0.10
0.05
Negative Logits
Allister
-0.64
Leod
-0.58
Millan
-0.57
ಘ
-0.56
arrol
-0.53
Cormack
-0.53
Quem
-0.50
ėjimas
-0.49
Oof
-0.48
Permite
-0.48
POSITIVE LOGITS
THOUGHT
0.98
THINK
0.97
scrat
0.96
affez
0.87
motorola
0.86
Think
0.86
shenan
0.85
thut
0.85
maneu
0.85
pollut
0.84
Activations Density 0.145%