INDEX
Explanations
phrases indicating possession or ownership
New Auto-Interp
Negative Logits
not
-0.08
nis
-0.07
ed
-0.07
ying
-0.07
not
-0.06
ase
-0.06
ual
-0.06
aksi
-0.06
toy
-0.06
oint
-0.06
POSITIVE LOGITS
anymore
0.08
'gc
0.08
exact
0.08
necessarily
0.07
vetica
0.07
iband
0.07
Arbitrary
0.07
/upload
0.06
’ta
0.06
lom
0.06
Activations Density 0.021%