INDEX
Explanations
imperative verbs or commands
New Auto-Interp
Negative Logits
Schmid
-0.54
spedes
-0.53
ising
-0.53
Fong
-0.51
avas
-0.49
Fabian
-0.49
uris
-0.49
amond
-0.48
asan
-0.48
idal
-0.48
POSITIVE LOGITS
Take
1.56
take
1.54
take
1.52
Take
1.49
TAKE
1.32
TAKE
1.22
takes
0.92
takes
0.91
taken
0.83
taken
0.83
Activations Density 0.017%