INDEX
Explanations
the word "Heritage" at various activation levels
mentions of the Heritage Foundation
New Auto-Interp
Negative Logits
redd
-0.71
agram
-0.70
tered
-0.70
orders
-0.67
vern
-0.66
unsub
-0.66
odiac
-0.65
gradient
-0.65
ching
-0.65
sie
-0.63
POSITIVE LOGITS
Heritage
1.16
conservancy
1.05
itage
0.89
Foundation
0.81
Collection
0.77
Institutes
0.74
icity
0.72
Dictionary
0.72
Institute
0.72
Values
0.71
Activations Density 0.008%