INDEX
Explanations
phrases indicating logical or justifiable explanations or causes
phrases indicating justification or rationale
New Auto-Interp
Negative Logits
semble
-0.83
chin
-0.77
Carbuncle
-0.70
oba
-0.70
Ping
-0.64
ega
-0.63
Territories
-0.62
rongh
-0.61
inav
-0.61
eg
-0.61
POSITIVE LOGITS
why
0.92
justifying
0.91
whatsoever
0.85
pointers
0.84
Reviewer
0.81
abl
0.76
justify
0.74
justification
0.73
forward
0.73
WHY
0.72
Activations Density 0.026%