INDEX
Explanations
phrases indicating difficulty in comprehension or needing an explanation
instances of the word "understand."
New Auto-Interp
Negative Logits
picking
-0.74
vertisement
-0.74
onies
-0.73
boro
-0.69
powder
-0.65
inating
-0.65
onto
-0.64
ãĤµ
-0.64
hire
-0.64
stead
-0.62
POSITIVE LOGITS
why
1.46
WHY
1.36
how
1.26
why
1.21
what
0.99
whats
0.95
HOW
0.87
ably
0.86
firsthand
0.82
nuances
0.81
Activations Density 0.072%