INDEX
Explanations
HTML document structure elements, specifically the `<head>` tag
New Auto-Interp
Negative Logits
je
-0.57
most
-0.55
plus
-0.53
dis
-0.51
-0.48
ris
-0.48
Go
-0.48
Dis
-0.46
go
-0.46
cosa
-0.46
POSITIVE LOGITS
head
2.72
heads
2.21
HEAD
2.19
Head
1.37
headed
1.18
Heads
1.14
ヘッド
1.03
HEAD
1.00
heading
1.00
Heads
0.88
Activations Density 0.136%