INDEX
Explanations
the words "I", "think", "not", and the pronoun "you"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1758
+0.12
0.4%
381
+0.11
0.3%
554
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1415
+0.12
0.04
1919
+0.11
0.05
805
+0.10
0.03
Negative Logits
territo
-1.00
augus
-0.93
lele
-0.86
uhr
-0.85
parlamento
-0.84
parati
-0.83
monaster
-0.81
masaj
-0.81
moza
-0.81
kado
-0.79
POSITIVE LOGITS
apprehen
0.65
จึง
0.62
hopefully
0.62
unavoid
0.58
naturally
0.57
disreg
0.56
understandably
0.56
unspeak
0.55
presumably
0.55
assume
0.53
Activations Density 0.141%