INDEX
Explanations
phrases related to reasons, motives, or causes
questions and statements about reasons, importance, and the nature of situations
New Auto-Interp
Negative Logits
sqor
-0.66
jri
-0.66
abase
-0.64
iewicz
-0.63
dismant
-0.61
fulfil
-0.61
premises
-0.60
feasibility
-0.59
simulac
-0.58
Ala
-0.58
POSITIVE LOGITS
Reviewer
0.92
forth
0.85
bothering
0.84
ãĤ»
0.83
so
0.79
bother
0.79
reluctant
0.77
singled
0.77
persist
0.77
hesitant
0.75
Activations Density 0.169%