INDEX
Explanations
phrases relating to reasons or justifications
"reason" or "reasons"
New Auto-Interp
Negative Logits
Inoue
-0.45
web
-0.43
online
-0.41
at
-0.40
Goldberg
-0.39
immersive
-0.38
multi
-0.38
Colbert
-0.37
dök
-0.36
dark
-0.36
POSITIVE LOGITS
Reason
1.29
reason
1.26
reason
1.25
Reason
1.23
Reasons
1.19
REASON
1.14
REASON
1.14
reasons
1.13
Reasons
1.13
reasons
1.06
Activations Density 0.124%