INDEX
Explanations
expressions of intention or purpose
New Auto-Interp
Negative Logits
strap
-0.16
duk
-0.16
ward
-0.15
Hatch
-0.15
istrovstvÃŃ
-0.15
ury
-0.14
rian
-0.14
/img
-0.14
une
-0.13
uber
-0.13
POSITIVE LOGITS
intent
0.18
odÃŃ
0.17
intent
0.16
ogy
0.15
843
0.15
intend
0.15
Loren
0.15
ãĥ¯ãĤ¤ãĥĪ
0.15
intention
0.15
illusion
0.14
Activations Density 0.120%