INDEX
Explanations
proper nouns
mentions of the word "BUR"
New Auto-Interp
Negative Logits
oral
-0.78
hover
-0.74
Martial
-0.72
wine
-0.72
uve
-0.68
ORGE
-0.66
temper
-0.63
impression
-0.63
Mach
-0.62
seas
-0.62
POSITIVE LOGITS
ity
1.16
ities
0.95
acters
0.85
itous
0.84
ITY
0.81
myra
0.80
etsk
0.79
erick
0.78
itar
0.78
isu
0.77
Activations Density 0.030%