INDEX
Explanations
numeric values in the format "x.x", likely related to finance or technical data
instances of the number seven
New Auto-Interp
Negative Logits
lett
-0.69
ificate
-0.64
medicine
-0.62
rule
-0.59
tarn
-0.58
safety
-0.58
mutually
-0.58
utter
-0.57
decor
-0.57
tell
-0.57
POSITIVE LOGITS
7
3.17
8
2.43
6
2.39
9
2.37
5
2.30
4
1.97
3
1.94
747
1.74
2
1.69
733
1.67
Activations Density 0.027%