INDEX
Explanations
conjunctions and phrases related to evaluating or comparing elements
phrases emphasizing conditionality and distinctions between permitted and prohibited actions
New Auto-Interp
Negative Logits
grand
-0.65
rocket
-0.65
crim
-0.61
crim
-0.56
Rover
-0.56
enne
-0.56
staking
-0.56
Detailed
-0.54
cision
-0.53
FORE
-0.53
POSITIVE LOGITS
how
1.15
what
1.12
why
1.07
whats
0.99
what
0.98
whence
0.95
why
0.94
where
0.92
WHY
0.91
WHAT
0.88
Activations Density 0.079%