INDEX
Explanations
phrases indicating possession or ownership
New Auto-Interp
Negative Logits
acad
-0.07
578
-0.06
oneself
-0.06
oly
-0.06
borg
-0.06
fo
-0.06
->
-0.05
Gat
-0.05
ing
-0.05
reportedly
-0.05
POSITIVE LOGITS
bung
0.08
raith
0.07
assin
0.07
elters
0.07
iegel
0.07
ustain
0.07
.Alpha
0.07
Ñıв
0.07
_defs
0.07
onds
0.07
Activations Density 0.026%