INDEX
Explanations
the definite article "the" and related forms in sentences
New Auto-Interp
Negative Logits
midst
-0.15
forefront
-0.15
quired
-0.15
ecessarily
-0.14
ह
-0.14
ses
-0.14
outset
-0.14
contents
-0.14
398
-0.13
opup
-0.13
POSITIVE LOGITS
only
0.41
reason
0.36
thing
0.35
oret
0.32
question
0.32
problem
0.32
fact
0.30
truth
0.29
ONLY
0.28
trick
0.28
Activations Density 0.470%