INDEX
Explanations
references to personal ownership or individuality
New Auto-Interp
Negative Logits
itself
-0.20
Himself
-0.16
himself
-0.16
herself
-0.16
oneself
-0.15
strand
-0.15
Dump
-0.14
Mansion
-0.14
776
-0.14
conte
-0.14
POSITIVE LOGITS
version
0.17
ledge
0.17
-brand
0.16
ständ
0.16
PFN
0.15
respective
0.15
agher
0.15
971
0.15
-ÑĤаки
0.14
ERGY
0.14
Activations Density 0.037%