INDEX
Explanations
phrases that indicate centrality or importance in various contexts
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.07
3:0.06
4:0.15
5:0.02
6:0.03
7:0.43
8:0.02
9:0.02
10:0.05
11:0.07
Negative Logits
erity
-1.98
��
-1.72
augh
-1.68
osponsors
-1.65
>>\
-1.61
bye
-1.60
RANT
-1.59
YR
-1.53
blers
-1.51
alf
-1.50
POSITIVE LOGITS
discussions
1.73
discussion
1.60
deliberations
1.55
determining
1.55
defining
1.52
scientific
1.51
scientific
1.51
QC
1.46
evidence
1.44
science
1.43
Activations Density 0.003%