INDEX
Explanations
user profile information
mentions of user profiles
New Auto-Interp
Negative Logits
abeth
-0.78
chest
-0.68
ows
-0.66
rought
-0.62
vest
-0.62
bled
-0.62
isen
-0.62
lins
-0.61
phas
-0.61
ilk
-0.61
POSITIVE LOGITS
Profile
1.30
profiles
0.92
iership
0.87
Joined
0.87
allery
0.86
profile
0.86
Seym
0.86
pedia
0.82
Profile
0.76
Features
0.74
Activations Density 0.005%