INDEX
Explanations
references to social media activity, particularly sharing of images and captions
New Auto-Interp
Negative Logits
Ri
-0.17
ling
-0.16
etting
-0.16
ãĥĵãĥ¼
-0.16
ụy
-0.15
rips
-0.15
zek
-0.14
constructs
-0.14
Pap
-0.14
icho
-0.14
POSITIVE LOGITS
Humanities
0.16
ãĥ©ãĥĥãĤ¯
0.15
ftime
0.15
Sensitive
0.14
ÑĤÑĢÑĥ
0.14
938
0.14
iges
0.14
éĿ
0.14
osaur
0.14
cea
0.14
Activations Density 0.022%