INDEX
Explanations
instances of the word "is" and various punctuation marks
New Auto-Interp
Head Attr Weights
0:0.06
1:0.06
2:0.04
3:0.07
4:0.28
5:0.07
6:0.07
7:0.08
8:0.04
9:0.05
10:0.06
11:0.07
Negative Logits
Quote
-3.15
Here
-2.99
Beet
-2.96
Judging
-2.83
Assuming
-2.77
Suppose
-2.69
Enter
-2.63
Helpful
-2.60
Click
-2.59
Reading
-2.58
POSITIVE LOGITS
OPS
3.40
INTER
3.04
FSA
2.96
weap
2.88
scrim
2.83
frail
2.79
UNHCR
2.73
reluct
2.67
SOU
2.66
exting
2.61
Activations Density 0.000%