INDEX
Explanations
phrases or sentences indicating a topic or focus of discussion
phrases that indicate discussions about what something is not related to
New Auto-Interp
Negative Logits
interstitial
-0.92
ãģ®ç
-0.70
awoken
-0.70
nonetheless
-0.69
©¶æ¥µ
-0.67
ãĤ¨
-0.66
âĹ¼
-0.66
pez
-0.65
åĮ
-0.64
00007
-0.63
POSITIVE LOGITS
necessarily
1.03
nor
0.96
anymore
0.92
mere
0.79
malice
0.79
anything
0.76
merits
0.75
merely
0.74
overtly
0.73
outright
0.71
Activations Density 0.346%