INDEX
Explanations
names starting with "Osa" at different activation levels
occurrences of the substring "osa" within words
New Auto-Interp
Negative Logits
ERAL
-0.83
doms
-0.80
rations
-0.76
sheet
-0.74
rics
-0.74
bler
-0.74
rary
-0.74
rator
-0.70
taking
-0.69
liest
-0.68
POSITIVE LOGITS
osa
1.03
Luxem
1.02
qua
0.94
que
0.94
velength
0.87
isy
0.84
hea
0.84
ña
0.82
ques
0.81
uce
0.80
Activations Density 0.016%