INDEX
Explanations
first-person pronouns and the word "You" used in a message or conversation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
381
+0.15
0.5%
1510
+0.14
0.4%
1978
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.15
0.11
1510
+0.14
0.07
331
+0.12
0.07
Negative Logits
monaster
-1.13
poliester
-1.06
sacerd
-0.95
persil
-0.92
cristi
-0.91
bewerken
-0.90
susun
-0.90
torba
-0.88
quoc
-0.88
ananas
-0.88
POSITIVE LOGITS
endeavouring
0.80
unspeak
0.76
resear
0.73
sophistic
0.73
impelled
0.71
don
0.71
practition
0.71
cannot
0.69
shouldn
0.68
didn
0.68
Activations Density 0.375%