INDEX
Explanations
actions or tasks that lead to a specific outcome
conjunctions indicating purpose or intent in statements
New Auto-Interp
Negative Logits
theless
-0.62
eni
-0.60
wire
-0.59
degree
-0.59
egal
-0.58
ropolitan
-0.57
REL
-0.56
wick
-0.55
ascus
-0.55
ãģ£
-0.55
POSITIVE LOGITS
othe
1.08
othes
1.02
apy
0.94
they
0.93
oner
0.89
that
0.86
bered
0.82
we
0.81
aps
0.80
nobody
0.80
Activations Density 0.070%