INDEX
Explanations
phrases related to explaining reasons or causes
occurrences of the phrase "part of" and related structures
New Auto-Interp
Negative Logits
rams
-0.82
ares
-0.73
eches
-0.68
onet
-0.67
orks
-0.66
ynes
-0.65
oons
-0.64
newsletters
-0.64
eder
-0.63
ICES
-0.63
POSITIVE LOGITS
reason
1.24
reasoning
0.90
impetus
0.89
why
0.89
explanation
0.88
motivation
0.87
reasons
0.84
blame
0.82
Reason
0.78
frustration
0.77
Activations Density 0.090%