INDEX
Explanations
phrases indicating a change or departure from established norms or past behaviors
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.08
3:0.06
4:0.12
5:0.03
6:0.05
7:0.37
8:0.03
9:0.03
10:0.05
11:0.08
Negative Logits
Rated
-1.54
rice
-1.43
Increase
-1.41
TB
-1.37
�
-1.35
Crunch
-1.32
toget
-1.30
ractor
-1.29
roof
-1.29
CPU
-1.29
POSITIVE LOGITS
uart
1.55
ansk
1.41
typ
1.41
pta
1.38
precon
1.36
conformity
1.36
spont
1.36
continuity
1.34
amples
1.32
Luxem
1.31
Activations Density 0.002%