INDEX
Explanations
descriptions or explanations
instances of the word "described"
New Auto-Interp
Negative Logits
acus
-0.74
ffic
-0.74
inals
-0.72
PU
-0.71
assi
-0.70
ammy
-0.70
externalActionCode
-0.68
app
-0.67
cot
-0.64
OPA
-0.64
POSITIVE LOGITS
descriptions
0.90
describ
0.79
REDACTED
0.79
describes
0.76
aloud
0.74
urated
0.74
snippets
0.73
traits
0.70
markings
0.70
urations
0.69
Activations Density 0.021%