INDEX
Explanations
locations mentioned with a high level of activation
instances of the word "at" used in various contexts
New Auto-Interp
Negative Logits
FTWARE
-0.76
alpha
-0.70
gravity
-0.69
Russ
-0.67
REDACTED
-0.65
HTTP
-0.63
istar
-0.62
Lens
-0.62
WRITE
-0.61
PE
-0.61
POSITIVE LOGITS
least
1.27
onement
0.98
abase
0.94
halftime
0.84
rial
0.82
roph
0.80
times
0.72
liberty
0.71
oned
0.70
las
0.70
Activations Density 0.242%