INDEX
Explanations
words posing questions or raising issues
repeated phrases or questions about "the" and its associated context
New Auto-Interp
Negative Logits
icism
-0.85
cation
-0.81
matter
-0.81
leness
-0.81
iliation
-0.80
rama
-0.80
cture
-0.79
obook
-0.79
analysis
-0.78
hyde
-0.77
POSITIVE LOGITS
indications
1.16
exceptions
1.07
signs
1.06
words
1.04
constants
1.04
truths
1.00
kinds
0.99
moments
0.99
nods
0.98
keys
0.98
Activations Density 0.176%