INDEX
Explanations
references to significant items or phenomena in a context suggesting analysis or evaluation
New Auto-Interp
Head Attr Weights
0:0.39
1:0.03
2:0.10
3:0.07
4:0.04
5:0.04
6:0.03
7:0.04
8:0.04
9:0.05
10:0.09
11:0.03
Negative Logits
pods
-2.50
Brave
-2.23
romeda
-2.19
upd
-2.14
Gy
-2.14
rons
-2.13
byss
-2.10
experiment
-2.07
mates
-2.05
Cran
-2.04
POSITIVE LOGITS
references
3.95
reference
3.61
reference
3.49
referencing
3.45
mentions
3.34
Reference
3.19
Reference
3.05
References
3.04
mentioning
2.99
referenced
2.95
Activations Density 0.002%