INDEX
Explanations
strong emotional reactions or actions
terms related to social media interactions and misunderstandings
New Auto-Interp
Negative Logits
ULTS
-0.72
uana
-0.71
phony
-0.69
pite
-0.67
metry
-0.67
idency
-0.66
millenn
-0.62
pine
-0.62
asio
-0.61
wives
-0.61
POSITIVE LOGITS
ed
2.05
edIn
1.44
ing
1.39
edly
1.27
er
1.06
sites
1.03
ery
1.01
es
0.98
ers
0.97
esy
0.96
Activations Density 0.112%