INDEX
Explanations
phrases that indicate exploration or immersion into a subject or experience
New Auto-Interp
Negative Logits
ingly
-0.15
hoff
-0.15
distance
-0.14
922
-0.14
ultipart
-0.14
830
-0.14
Distance
-0.14
CLE
-0.13
444
-0.13
åĿª
-0.13
POSITIVE LOGITS
rodu
0.15
Kremlin
0.14
Waters
0.14
é»ĺ
0.14
erman
0.14
atat
0.14
ÙħعÙĦÙĪÙħات
0.14
éŀ
0.13
sul
0.13
geois
0.13
Activations Density 0.026%