INDEX
Explanations
occurrences of the token "omm" at various activations
occurrences of the substring "omm" within words
New Auto-Interp
Negative Logits
ted
-0.88
ting
-0.86
BOOK
-0.75
footed
-0.70
JUST
-0.70
Hindus
-0.67
TRUMP
-0.66
Charges
-0.65
realDonaldTrump
-0.64
FORE
-0.62
POSITIVE LOGITS
obile
1.17
ittee
1.17
orrow
1.15
essage
1.14
ission
0.98
ando
0.98
acent
0.97
orr
0.95
useum
0.95
ajor
0.93
Activations Density 0.007%