INDEX
Explanations
phrases related to geopolitical events and international relationships, especially involving the U.S
occurrences of the abbreviation "U.S." followed by a period
New Auto-Interp
Negative Logits
Ellison
-0.60
Dylan
-0.59
·
-0.57
Elvis
-0.57
iceberg
-0.57
Titanic
-0.57
epid
-0.56
cherry
-0.55
Glacier
-0.54
fid
-0.54
POSITIVE LOGITS
.-
3.60
.–
2.50
.--
2.05
)-
1.79
.—
1.78
.:
1.75
./
1.74
.(
1.64
:-
1.62
.�
1.59
Activations Density 0.021%