INDEX
Explanations
references to scientific concepts and findings
New Auto-Interp
Head Attr Weights
0:0.28
1:0.14
2:0.05
3:0.05
4:0.02
5:0.04
6:0.03
7:0.01
8:0.05
9:0.05
10:0.05
11:0.16
Negative Logits
ocate
-1.54
etheless
-1.54
akespe
-1.53
版
-1.52
usional
-1.52
TAG
-1.52
cius
-1.45
ospace
-1.45
ocating
-1.45
OTS
-1.44
POSITIVE LOGITS
1977
1.62
1993
1.61
Jr
1.53
1996
1.52
1992
1.51
1997
1.51
1998
1.49
1991
1.49
alli
1.48
1972
1.47
Activations Density 0.002%