INDEX
Explanations
phrases indicating understanding or comprehension
instances of the word "understands."
New Auto-Interp
Negative Logits
drop
-0.61
bearing
-0.61
resulting
-0.60
«
-0.59
gra
-0.59
Boll
-0.58
bearing
-0.58
spring
-0.58
dubious
-0.57
random
-0.57
POSITIVE LOGITS
understands
3.53
knows
1.89
understood
1.87
understand
1.80
realizes
1.77
recognizes
1.73
believes
1.60
learns
1.58
Understand
1.52
accepts
1.51
Activations Density 0.013%