INDEX
Explanations
the presence and frequency of the word "are."
New Auto-Interp
Negative Logits
zieży
-0.74
verksamhet
-0.67
σία
-0.67
stuff
-0.64
neuem
-0.64
weevil
-0.62
lèvres
-0.61
compos
-0.61
Stuff
-0.60
protos
-0.59
POSITIVE LOGITS
few
1.11
úgó
1.00
many
0.99
MANY
0.96
fewer
0.92
MANY
0.92
Meksiku
0.91
Fewer
0.89
TagMode
0.89
few
0.86
Activations Density 0.080%