INDEX
Explanations
phrases related to causal relationships or explanations
phrases that indicate reasons or explanations for various situations
New Auto-Interp
Negative Logits
deen
-0.75
elf
-0.75
ocre
-0.74
actions
-0.69
tg
-0.69
eno
-0.68
action
-0.66
roup
-0.65
ateur
-0.64
Specialist
-0.64
POSITIVE LOGITS
impetus
1.47
catalyst
1.39
inspiration
1.28
basis
1.28
justification
1.23
reason
1.20
backdrop
1.18
pretext
1.18
foundation
1.17
motivation
1.15
Activations Density 0.325%