INDEX
Explanations
sentences containing explanations or descriptions
instances of explaining situations or concepts
New Auto-Interp
Negative Logits
thood
-0.73
iaries
-0.72
luster
-0.70
mage
-0.70
oso
-0.68
ngth
-0.67
essee
-0.65
tackle
-0.64
elight
-0.62
isha
-0.62
POSITIVE LOGITS
rationale
1.14
reasoning
1.05
why
0.92
why
0.91
concepts
0.90
virtues
0.87
principles
0.87
criteria
0.85
workings
0.85
WHY
0.85
Activations Density 0.267%