INDEX
Explanations
instances of the phrase "just one example of."
New Auto-Interp
Negative Logits
\\\\\\\\
-0.77
adolesc
-0.67
recomm
-0.67
OWS
-0.66
baugh
-0.65
��
-0.65
Hayden
-0.65
owed
-0.64
ANE
-0.60
tatt
-0.60
POSITIVE LOGITS
lev
0.75
low
0.66
ⓘ
0.63
rab
0.63
ロ
0.61
assemb
0.61
Explosive
0.60
low
0.60
lean
0.60
eele
0.59
Activations Density 0.021%