INDEX
Explanations
phrases related to urging or recommending action or information
references to individuals or the concept of "anyone" in various contexts
New Auto-Interp
Negative Logits
adequ
-0.62
irth
-0.62
ffic
-0.61
heny
-0.60
ocamp
-0.59
ories
-0.59
bows
-0.59
pa
-0.58
itals
-0.58
Ma
-0.58
POSITIVE LOGITS
else
1.57
THING
1.12
Else
1.12
else
1.08
Else
1.04
soever
0.89
doubted
0.89
body
0.89
imaginable
0.88
20439
0.87
Activations Density 0.019%