INDEX
Explanations
occurrences of the article 'a' in various contexts
New Auto-Interp
Negative Logits
odore
-0.21
l
-0.20
d
-0.20
sson
-0.19
t
-0.18
soever
-0.17
uada
-0.17
oretical
-0.17
c
-0.16
y
-0.16
POSITIVE LOGITS
row
0.16
fa
0.15
udent
0.15
κα
0.14
undry
0.14
oma
0.14
رس
0.14
ÙĪÛĮÛĮ
0.14
idi
0.14
usan
0.14
Activations Density 0.025%