INDEX
Explanations
phrases that stand out or emphasize a specific point
repeated mentions of specific phrases
New Auto-Interp
Negative Logits
Flavoring
-0.78
ÄŁ
-0.76
hari
-0.76
Skydragon
-0.74
DERR
-0.74
elsen
-0.73
ntil
-0.68
gur
-0.66
gewater
-0.64
©¶æ¥µ
-0.64
POSITIVE LOGITS
phrase
1.10
phrase
1.05
ology
1.05
phrases
1.00
terday
0.90
uttered
0.89
mith
0.82
witz
0.82
naire
0.80
stress
0.78
Activations Density 0.008%