INDEX
Explanations
phrases related to reasoning or explanation
the word "why" or variations of it to indicate reasoning or justification
New Auto-Interp
Negative Logits
MM
-0.68
tuber
-0.67
phys
-0.62
Ranger
-0.61
polymorph
-0.61
Winged
-0.60
Suzuki
-0.60
skelet
-0.58
Juda
-0.58
Roller
-0.58
POSITIVE LOGITS
why
1.02
soever
0.89
why
0.87
forward
0.82
WHY
0.79
ioned
0.77
ratulations
0.71
utterstock
0.68
iatus
0.68
ctuary
0.68
Activations Density 0.026%