INDEX
Explanations
numerical data and statistics related to research studies
New Auto-Interp
Negative Logits
96
-0.22
51
-0.21
52
-0.21
56
-0.20
57
-0.20
54
-0.19
49
-0.19
296
-0.19
97
-0.19
46
-0.19
POSITIVE LOGITS
650
0.39
620
0.37
610
0.37
600
0.36
680
0.35
612
0.35
611
0.35
640
0.35
660
0.35
690
0.35
Activations Density 0.143%