INDEX
Explanations
specific mentions of names, titles, or entities
occurrences of specific high-frequency or significant names and terms relevant to current events or notable entities
New Auto-Interp
Negative Logits
bender
-0.67
autistic
-0.63
unc
-0.59
Takeru
-0.58
lest
-0.58
commissions
-0.58
scissors
-0.56
Pixar
-0.56
ogle
-0.55
orc
-0.54
POSITIVE LOGITS
bilt
1.17
emort
0.90
export
0.82
Leaks
0.74
ennes
0.69
sov
0.69
amins
0.67
sky
0.67
oÄŁ
0.67
idis
0.67
Activations Density 0.428%