INDEX
Explanations
sentences that contain periods, indicating the end of statements
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.09
3:0.11
4:0.09
5:0.04
6:0.25
7:0.09
8:0.03
9:0.05
10:0.07
11:0.06
Negative Logits
��
-2.05
ADRA
-1.75
��
-1.59
Debor
-1.55
フォ
-1.53
ORED
-1.46
ufact
-1.44
��
-1.44
Pru
-1.41
submarines
-1.41
POSITIVE LOGITS
1.84
graph
1.61
paren
1.59
document
1.58
parency
1.58
mite
1.54
talk
1.52
wiki
1.49
lab
1.44
research
1.42
Activations Density 0.002%