INDEX
Explanations
text related to instructions or procedures
instances of the word "how"
New Auto-Interp
Negative Logits
Mé
-0.63
receiving
-0.62
critic
-0.61
ub
-0.61
civilian
-0.61
Cortex
-0.61
crus
-0.60
grain
-0.59
�
-0.58
Interior
-0.58
POSITIVE LOGITS
soever
1.16
ever
1.01
HCR
0.95
bill
0.93
iculty
0.92
how
0.89
nesota
0.80
much
0.80
ãĤ¼ãĤ¦ãĤ¹
0.79
eries
0.78
Activations Density 0.006%