INDEX
Explanations
the word "Instagram" in texts
mentions of the social media platform Instagram
New Auto-Interp
Negative Logits
CFR
-0.81
endez
-0.68
quartered
-0.68
riott
-0.67
pneum
-0.63
stood
-0.63
neg
-0.63
ensable
-0.63
vernment
-0.62
ensical
-0.62
POSITIVE LOGITS
photos
0.95
mable
0.94
feeds
0.87
0.83
mers
0.78
ster
0.73
pics
0.73
shots
0.72
tags
0.72
Url
0.71
Activations Density 0.015%