INDEX
Explanations
references to images, photographs, or visual content
New Auto-Interp
Head Attr Weights
0:0.04
1:0.02
2:0.03
3:0.07
4:0.08
5:0.06
6:0.07
7:0.42
8:0.07
9:0.02
10:0.04
11:0.04
Negative Logits
<|endoftext|>
-2.73
________________
-2.63
Advertisements
-2.62
Logged
-2.60
NCT
-2.43
---------
-2.33
Admin
-2.31
Views
-2.30
Moons
-2.28
________________________
-2.28
POSITIVE LOGITS
hello
2.74
cow
2.27
pictured
2.27
onge
2.24
Britain
2.20
tto
2.20
raltar
2.19
tyres
2.06
ffield
2.04
style
2.01
Activations Density 0.166%