INDEX
Explanations
phrases beginning with a specific word
instances of the word "at" indicating specific locations or times
New Auto-Interp
Negative Logits
gravity
-0.78
FTWARE
-0.76
alpha
-0.72
fill
-0.68
Guest
-0.63
Pac
-0.60
istar
-0.60
Lens
-0.60
Russ
-0.59
REDACTED
-0.59
POSITIVE LOGITS
least
1.25
onement
0.96
halftime
0.96
abase
0.89
hens
0.85
roph
0.82
dusk
0.76
rial
0.74
variance
0.73
sunset
0.70
Activations Density 0.242%