INDEX
Explanations
mentions of personal relationships and social interactions
New Auto-Interp
Negative Logits
establishment
-0.16
icie
-0.16
zia
-0.16
uve
-0.15
avana
-0.15
shocked
-0.15
ìĥģ
-0.15
sak
-0.15
432
-0.15
astonished
-0.14
POSITIVE LOGITS
thinking
0.19
éİ®
0.16
847
0.15
ãĥ³ãĥĸ
0.15
thinking
0.15
zano
0.14
aland
0.14
started
0.14
started
0.14
involved
0.14
Activations Density 0.053%