INDEX
Explanations
articles ('a', 'an') followed by a strong activation word
the word "a" in various contexts
New Auto-Interp
Negative Logits
Jagu
-0.81
Contents
-0.81
Edit
-0.78
ãĤ¬
-0.78
Ingredients
-0.74
Features
-0.72
onto
-0.72
Ont
-0.71
chuk
-0.71
Catal
-0.70
POSITIVE LOGITS
lot
1.16
couple
1.08
cknowled
1.05
few
1.04
bunch
1.03
handful
0.99
huge
0.94
rouse
0.94
combination
0.94
typical
0.93
Activations Density 0.859%