INDEX
Explanations
phrases indicating capability or sufficiency related to reading and comprehension
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.15
3:0.05
4:0.15
5:0.02
6:0.06
7:0.24
8:0.02
9:0.04
10:0.15
11:0.04
Negative Logits
sweep
-1.58
Tro
-1.55
rhet
-1.55
annihilation
-1.49
campaign
-1.48
POLIT
-1.45
direct
-1.45
ramid
-1.43
Rog
-1.42
alks
-1.42
POSITIVE LOGITS
remember
1.83
reckoning
1.71
addon
1.66
cair
1.63
stripes
1.62
recol
1.59
aware
1.59
riter
1.56
comfort
1.55
<=
1.53
Activations Density 0.000%