INDEX
Explanations
phrases and context related to background information
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.05
3:0.05
4:0.09
5:0.02
6:0.02
7:0.50
8:0.02
9:0.02
10:0.07
11:0.07
Negative Logits
�
-1.58
divid
-1.45
paws
-1.45
fits
-1.43
bands
-1.43
oy
-1.41
stretched
-1.41
phies
-1.40
houses
-1.40
array
-1.38
POSITIVE LOGITS
researching
1.66
Alchemist
1.61
Philosophy
1.55
cius
1.52
sourcing
1.51
biology
1.51
causation
1.49
resear
1.47
basics
1.47
researched
1.45
Activations Density 0.009%