INDEX
Explanations
numerical identifiers followed by a specific symbol
instances of hashtags and post metadata
New Auto-Interp
Negative Logits
colle
-0.76
acknow
-0.76
ãĥ³ãĤ¸
-0.75
reve
-0.72
ahime
-0.72
weap
-0.70
veh
-0.69
nown
-0.67
ãĥ¼ãĥĨ
-0.65
cane
-0.65
POSITIVE LOGITS
########
1.19
################################
1.18
################
1.09
###
0.89
nice
0.87
Posts
0.81
region
0.80
DIV
0.79
Reply
0.77
##
0.74
Activations Density 0.011%