INDEX
Explanations
user profiles on an online platform based on the number of posts they have made
instances of the word "Posts" and associated numerical values
New Auto-Interp
Negative Logits
Bhar
-0.73
Palest
-0.69
ONSORED
-0.66
fitted
-0.65
bid
-0.65
inant
-0.63
ãĥ¼ãĤ¯
-0.62
Horowitz
-0.62
fitting
-0.61
itably
-0.61
POSITIVE LOGITS
Posts
1.46
Posts
1.12
ertodd
0.96
posts
0.90
Joined
0.90
reply
0.82
Posted
0.81
Joined
0.77
Offense
0.77
################
0.74
Activations Density 0.008%