INDEX
Explanations
phrases or terms related to understanding and comprehension
New Auto-Interp
Negative Logits
onies
-0.82
gob
-0.76
hire
-0.75
ONS
-0.74
unal
-0.71
ovie
-0.68
arious
-0.65
exting
-0.64
iere
-0.64
-+-+-+-+
-0.64
POSITIVE LOGITS
how
1.18
why
1.11
WHY
1.10
whats
0.94
why
0.94
what
0.85
HOW
0.85
Understanding
0.84
how
0.78
nuances
0.77
Activations Density 0.041%