INDEX
Explanations
phrases related to social media actions
occurrences of the word "blocked" and its variations
New Auto-Interp
Negative Logits
shown
-0.78
rift
-0.76
present
-0.73
eah
-0.73
nucleus
-0.71
kind
-0.69
gain
-0.68
matically
-0.67
pleasing
-0.67
rious
-0.66
POSITIVE LOGITS
ãĤ´ãĥ³
0.80
ĵĺ
0.76
Lists
0.63
ogging
0.63
iard
0.63
Susp
0.62
adoes
0.61
ishment
0.61
pedia
0.61
buster
0.60
Activations Density 0.026%