INDEX
Explanations
frequent occurrences of proper nouns and specific identifiers
New Auto-Interp
Head Attr Weights
0:0.07
1:0.33
2:0.04
3:0.04
4:0.05
5:0.20
6:0.04
7:0.03
8:0.04
9:0.05
10:0.05
11:0.03
Negative Logits
Codec
-2.19
phys
-2.03
Observatory
-2.00
Jew
-1.89
hosp
-1.81
knob
-1.77
Showtime
-1.76
doc
-1.74
SDK
-1.73
docs
-1.71
POSITIVE LOGITS
ARE
3.01
older
2.76
and
2.73
ands
2.66
are
2.63
anders
2.55
andre
2.42
â
2.36
andise
2.31
OR
2.31
Activations Density 0.003%