INDEX
Explanations
the word "all" appearing with high activation
occurrences of the word "all."
New Auto-Interp
Negative Logits
aminer
-0.73
potion
-0.72
yip
-0.69
IDS
-0.64
lav
-0.64
zn
-0.64
ker
-0.63
assembly
-0.62
arter
-0.61
oute
-0.61
POSITIVE LOGITS
ocating
1.33
kinds
1.17
igators
1.11
sorts
1.10
igator
1.04
iances
1.02
owing
1.02
usions
1.00
ocated
0.94
ocate
0.93
Activations Density 0.139%