INDEX
Explanations
instances of the word "to" indicating actions or submissions
New Auto-Interp
Negative Logits
ワン
-0.96
覚醒
-0.86
freely
-0.77
グ
-0.75
cheaply
-0.74
leeve
-0.71
bodied
-0.69
accessible
-0.68
furiously
-0.67
565
-0.67
POSITIVE LOGITS
Michele
0.75
Polit
0.73
us
0.71
me
0.71
POLITICO
0.70
Manny
0.70
Danielle
0.69
Ralph
0.68
Northwestern
0.68
Herb
0.68
Activations Density 0.211%