INDEX
Explanations
instances of the phrase "a" followed by various nouns or descriptors
New Auto-Interp
Negative Logits
ĽĪ
-0.15
ffa
-0.15
شت
-0.15
pires
-0.15
TestCategory
-0.15
dán
-0.14
velt
-0.14
addCriterion
-0.14
pus
-0.14
olas
-0.14
POSITIVE LOGITS
beating
0.29
liking
0.29
cue
0.29
step
0.26
cues
0.25
stance
0.25
leap
0.24
toll
0.23
look
0.23
stab
0.23
Activations Density 0.057%