INDEX
Explanations
phrases related to placing entities in certain positions or situations
phrases that indicate position or status in relation to various contexts
New Auto-Interp
Negative Logits
uttered
-0.66
motions
-0.64
refuses
-0.63
forbids
-0.60
happens
-0.60
manifests
-0.60
(%
-0.59
leys
-0.58
consists
-0.57
wisely
-0.57
POSITIVE LOGITS
jeopardy
0.94
ãĥ¯ãĥ³
0.91
pmwiki
0.79
ãĤ´ãĥ³
0.74
peril
0.73
unwelcome
0.71
ãĤ§
0.71
unch
0.68
rongh
0.68
exha
0.66
Activations Density 0.186%