INDEX
Explanations
statements where someone knows something
references to knowledge and awareness
New Auto-Interp
Negative Logits
pending
-0.64
dramatic
-0.60
critical
-0.59
negative
-0.59
additional
-0.59
desk
-0.56
incorporation
-0.56
due
-0.55
optional
-0.55
coming
-0.55
POSITIVE LOGITS
knows
3.34
understands
2.20
knew
2.14
remembers
1.81
know
1.77
learns
1.76
realizes
1.73
know
1.66
thinks
1.64
KNOW
1.63
Activations Density 0.016%