INDEX
Explanations
first-person statements indicating action or sharing information
phrases that include the speaker's perspective or direct address
New Auto-Interp
Negative Logits
ELD
-0.70
ILD
-0.68
NAS
-0.64
bidden
-0.59
Associated
-0.59
inaction
-0.58
ibles
-0.58
required
-0.58
gradient
-0.58
senal
-0.58
POSITIVE LOGITS
assume
0.99
assure
0.93
emphasize
0.93
clarify
0.92
subscribe
0.90
give
0.89
pray
0.88
suppose
0.87
proceed
0.86
guess
0.84
Activations Density 0.040%