INDEX
Explanations
phrases expressing desires or intentions
New Auto-Interp
Negative Logits
UnusedPrivate
-0.64
alors
-0.59
سكانية
-0.58
InstrumentedTest
-0.57
Verkä
-0.56
itarianism
-0.56
LookAnd
-0.56
Expect
-0.56
oredCriteria
-0.55
'{@-0.54
POSITIVE LOGITS
know
1.12
hear
0.83
know
0.80
knows
0.79
Know
0.79
Know
0.78
Known
0.73
Known
0.71
known
0.70
graag
0.70
Activations Density 0.201%