INDEX
Explanations
mentions of consideration and attention in decision-making contexts
New Auto-Interp
Negative Logits
aho
-0.20
nul
-0.17
tah
-0.15
iguous
-0.15
uppies
-0.14
okud
-0.14
ër
-0.14
anga
-0.14
uraa
-0.14
reesome
-0.14
POSITIVE LOGITS
given
0.36
paid
0.34
accord
0.30
given
0.29
Given
0.27
devoted
0.26
_given
0.26
placed
0.25
Paid
0.24
taken
0.24
Activations Density 0.105%