INDEX
Explanations
phrases indicating rationale or justification
phrases emphasizing the justification or rationale behind statements
New Auto-Interp
Negative Logits
chron
-0.75
chin
-0.68
tein
-0.68
Carbuncle
-0.65
inav
-0.64
semble
-0.64
eg
-0.62
ega
-0.61
Warcraft
-0.60
ages
-0.60
POSITIVE LOGITS
why
1.41
WHY
1.19
why
1.17
abl
1.12
Why
0.94
Why
0.88
justifying
0.84
pointers
0.81
Reviewer
0.80
rationale
0.78
Activations Density 0.037%