INDEX
Explanations
the name "Bet" with varying degrees of activation, focusing particularly on the highest activation value
occurrences of the word "Bet" in various contexts
New Auto-Interp
Negative Logits
anwhile
-0.85
ĸļ
-0.78
OPLE
-0.77
anguage
-0.68
SPONSORED
-0.65
Flavoring
-0.64
obser
-0.64
IGHTS
-0.63
nesota
-0.61
aughter
-0.60
POSITIVE LOGITS
terness
1.01
bet
0.99
hesda
0.99
ting
0.98
Bet
0.96
lehem
0.89
tery
0.86
viation
0.84
Bet
0.83
bet
0.82
Activations Density 0.005%