INDEX
Explanations
prepositional phrases starting with 'at'
the phrase "not at" followed by varying intensity levels in different contexts
New Auto-Interp
Negative Logits
FTWARE
-0.81
alpha
-0.76
Russ
-0.71
gravity
-0.71
Lens
-0.67
REDACTED
-0.65
Pac
-0.65
ships
-0.64
vous
-0.63
eria
-0.62
POSITIVE LOGITS
least
1.41
abase
1.00
onement
0.98
roph
0.93
rial
0.88
halftime
0.85
times
0.81
yp
0.81
ention
0.79
hens
0.79
Activations Density 0.325%