INDEX
Explanations
phrases related to living arrangements and moving in with others
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.13
0.4%
964
+0.10
0.3%
227
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1166
+0.13
0.04
1013
+0.10
0.04
1336
+0.08
0.04
Negative Logits
provato
-0.62
resear
-0.61
istr
-0.55
erad
-0.55
sclero
-0.54
contex
-0.54
sentito
-0.53
upvoted
-0.53
rispondere
-0.53
pensato
-0.53
POSITIVE LOGITS
Jusqu
0.76
Toujours
0.72
Leurs
0.68
Malgré
0.65
Mère
0.63
Quelques
0.61
AndEndTag
0.61
roommates
0.59
churrasco
0.59
marginVertical
0.58
Activations Density 0.260%