INDEX
Explanations
occurrences of identifying verbs and attributes related to personal backgrounds and relationships
New Auto-Interp
Negative Logits
ctrine
-0.17
theses
-0.15
olen
-0.15
itti
-0.14
ãĤĥ
-0.14
etre
-0.14
axies
-0.14
Ŀ
-0.14
ERGE
-0.14
ImageUrl
-0.14
POSITIVE LOGITS
friends
0.24
a
0.23
part
0.23
from
0.21
older
0.21
an
0.20
related
0.20
younger
0.19
Jewish
0.19
gay
0.18
Activations Density 0.464%