INDEX
Explanations
references to personal identity or possession
references to a specific person's name or identity
New Auto-Interp
Negative Logits
س
-0.74
INO
-0.74
à¼
-0.73
itect
-0.70
ipation
-0.69
itud
-0.68
Sakuya
-0.67
wav
-0.67
TRANS
-0.66
1080
-0.65
POSITIVE LOGITS
anmar
1.28
stery
1.25
self
1.21
ths
1.13
riad
1.09
selves
1.07
own
1.00
apologies
0.95
Own
0.88
stic
0.87
Activations Density 0.033%