INDEX
Explanations
occurrences of the word "unt" or variations of it
New Auto-Interp
Negative Logits
ROME
-0.16
icine
-0.15
leta
-0.15
lette
-0.15
/tos
-0.15
macros
-0.15
rico
-0.14
Brah
-0.14
########
-0.14
agher
-0.14
POSITIVE LOGITS
ouch
0.34
old
0.32
oward
0.31
ether
0.31
angling
0.30
apped
0.29
idy
0.28
amed
0.27
ainted
0.26
arn
0.26
Activations Density 0.006%