INDEX
Explanations
phrases related to actions, experiences, and thoughts of people
expressions of measurement related to people's actions and outcomes
New Auto-Interp
Negative Logits
>>>>>>>>
-0.58
Zup
-0.52
theless
-0.52
notwithstanding
-0.50
gallery
-0.48
Marino
-0.48
bilt
-0.46
hov
-0.46
Moder
-0.45
Yon
-0.45
POSITIVE LOGITS
actually
0.54
otherwise
0.53
interacted
0.53
currently
0.53
desired
0.52
perceive
0.51
deems
0.50
supposedly
0.50
perce
0.49
congreg
0.49
Activations Density 0.690%