INDEX
Explanations
word tokens related to user profiles
references to social media or public identities
New Auto-Interp
Negative Logits
EMS
-0.72
hers
-0.71
Kingdoms
-0.68
Cla
-0.66
wic
-0.65
ieves
-0.65
Sew
-0.64
ccoli
-0.62
truth
-0.62
hens
-0.62
POSITIVE LOGITS
profile
4.00
profiles
3.05
profile
2.39
Profile
2.38
Profile
2.37
profiling
1.45
biography
1.36
footprint
1.25
portrait
1.20
persona
1.15
Activations Density 0.015%