INDEX
Explanations
words or terms that start with "tr" and are relevant in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1323
+0.18
1.1%
1328
+0.17
1.0%
1127
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1323
+0.18
0.03
1328
+0.17
0.03
1127
+0.14
0.02
Negative Logits
<bos>
-1.70
vainly
-0.68
endow
-0.60
beheld
-0.60
hasten
-0.59
disarm
-0.59
/***
-0.59
impelled
-0.58
//---
-0.58
gratify
-0.58
POSITIVE LOGITS
Tru
1.25
tr
1.17
Tru
1.16
Tr
1.16
Tr
1.09
TR
1.00
tr
0.93
Trit
0.91
Truman
0.88
Trujillo
0.87
Activations Density 0.112%